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1. INTRODUCTION 


The objective of this study is to find an algorithm which 
accurately and efficiently estimates production and consumption 

patterns in the wheat market when various information con- 
ditions are available to the participants. Such an algorithm could be 
used to estimate, given a suitable definition of overall "welfare," 
the net value to society of an improved wheat forecasting program 
such as the NASA LANDSAT system. Of course, the accuracy of the value 
estimates depends on both the information supplied to the program, 
viz.: the economic model, and the definition of "welfare," as well as 

the operation of the algorithm itself. An economic model of the wheat 
sector which is suitable for this problem has been developed by ECON [1], 
and our work has been limited to 1) precise formulation of the model 
within a stochastic control framework, 2) choice and development of a 
suitable algorithm to solve the model. 

The theoretical foundation on which our work is based is the results 
of stochastic control theory (Witsenhausen [2]) and Markov Decision 
Theory (Howard [3], Schweitzer [4], Odoni [5], and Varaiya [6]), in- 
cluding some new theoretical results on discounted Markov Processes 
which will be presented here (and also in Jones [7]). The major 
theoretical steps in our work have been: 


1) Generalization of the ECON model 

2) Definition of stochastic control problem 
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3) Definition of infinite-horizon discounted rewards 

4) Definition of information states 

5) Approximation as finite-state Markov chain, 

and these results will be presented in Section 2. The choice of algorithm 
will be discussed in Section 3. We have applied the algorithm to a sim- 
plified two-period, one country model , both for debugging purposes, and 
also to get a feel for the behavior of the algorithm in a more tractable 
problem than the many-thousand state Markov chain model based on larger 
ECON model. For a fixed quantization of the state-space, convergence 
is relatively fast and monotonic, and we have every reason to believe 
that this rate of convergence will be nearly achieved by the larger model. 
The memory requirement of the algorithm is a small multiple of the total 
number of states. Then the memory requirement increases with the number 
of discrete states. Our program is written for a generalized ECON model 
which includes multiple crops, countries, and harvest times, so no re- 
programming will be necessary for extended models. 

In Section 4, a one-country, two period example is treated in detail 
and problems concerning the use and convergence of algorithms, and a 
comparison with the ECON results for the same example are considered. 

A number of interesting differences between the results are also pre- 
sented. 

Three appendices are included at the end of the report to provide 
further details on the Markov programming algorithm. 
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2. THEORETICAL FORMULATION 


2.1 Introduction 

The first step in finding a suitable algorithm is posing the problem 
in the proper theoretical framework,, so that convergence and uniqueness 
of the solution can be assured. Ultimately, we will formulate the model 
as a finite-state Markov chain, where the states represent the information 
available to consumers and producers about stocks and crops, namely 
their estimated values.- But before we can logically get to this stage, 
we must start at the foundation, the real-world variables and model. 

We will then show how this is explicitly simplified to the information- 
state model, and then to the finite-state model, which lends itself to 
computer solution. 

We begin by reexamining the ECON model in terms of real-world 
quantities and relationships. 

2.2 ECON Model 

In the ECON model there are two types of grain: grain which is 

growing on a farm and has not yet been harvested, and grain which has 
been harvested, but not yet consumed; e.g., in transit, in storage re- 
serves, etc. Let us call these grain types 1 and 2, respectively. 

Grain can exist, within the ECON model's discrimination, in one of 
two places: in the US, or in the rest of the world (ROW). The ECON 

model, then, is concerned with four real-world quantities, and although 
the state-space is not the quantities themselves but only estimates of 
them, let us momentarily take the state variables to be the real-world 
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quantities; They are: 

Type 1 grain in US, X2 (unharvested, in the ground) 

Type 2 grain in US, (harvested, unconsumed) 

Type 1 grain in ROW, x^ (unharvested, in the ground) 

Type 2 grain in ROW, x^ (harvested, unconsumed) 

Now suppose that the world wheat market is in some state, which is 
a value x(t)e eJ of (xj X2 x^ x^)*. Consumers, producers and exporters 
have access to certain limited imperfect public information about x(t), 
which we can call I(t), on which they base their consumptions (y^ in US, 
y^ in ROW), plantings (y3, y^), and exports from US to ROW (y2). The 
state of the wheat market at t+1 is then a function of x(t), y(t), and 
some random disturbances v(t): 

x(t+l) = f(x(t),y(t),v(t),t). 

f is, for any t, a linear function in x, y and v, due to the simple 
additive and subtractive nature of consumption and production. Spe- 
cifically, the equations are: 

'x^(t) - y^(t) - y2(t) + v^(t) non-harvest period 

(US) Xj(t+1) = J 

Ixj(t) - yj(t) - y2(t) + Vj(t) + X2(t) harvest period 

fx3(t) - y^(t) + y2(t) + V3(t) non-harvest period 

(ROW) X3(t+l)=| 

[x3(t) - y4(t) + y2(t) + V3(t) + x^(t) harvest period 

0 post harvest and pre-planting 

(US) X2(t+1) = ^ X2(t) + V2(t) non-planting, pre harvest 

X2(t) + y3(t) + V2(t) planting, pre harvest 

*eJ denotes the positive quadrant of the 4 dimensional Euclidean space. 
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post harvest, pre-planting 
non-planting, pre harvest 
planting, pre harvest 


(ROW) X4(t+1) 


X 4 + V4(t) 

X4(t) + yg(t) + V4(t) 


or, for suitable choice of matrices M and N (which will have either 0, 1, -1 
as elements): 

x(t+l) = M(t)x(t) + N(t)y(t) + v(t) 

The following inequalities restrict the choice of y: 

1. y s 0 

2. y^ = 0 during nonplanting periods in US 

3. yg = 0 during nonplanting periods in ROW 

4. yi + y2 ^ Xi 

5. y^ < X3 

Inequality (5) takes into account the transportation lag for exports of 
about one month. It should be noted from the above equations that the ma- 
trix N could be eliminated by defining u^ = -y^ - y^, u^ = -y 4 + y 2 , 

1^2 = y 3 > and u^ = y^ so that 

x(t+l) = M(t)x(t) + u(t) + v(t) 

Let us now determine the constraints on u. From the constraints on y,, y^ 

w 0 

we evidently have: 


f[0,oo] during planting period in US 

U2 e ^ 

[[0] during nonplanting period in iUS 


2 '. 


'[0,“] during planting periods in ROW 
[0] during nonplanting periods in ROW 
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The constraints in Uj can be found by rewiting inequality (4) with the 
substitution = -Uj - y 2 * 

Yl + y2 = ~^i - ^2 ^2 " ■‘^1 - 

so Uj s -Xj. An upper bound on Uj is obtained from the inequalities 
y^ > 0 and y 2 ^ 0: 

yi = -Ui - y2 ^ 0 ^ -^2 

y2 > 0 -> -y2 ^ 0 - 0 

SO 

3'. Uj 6 [-X^,0] 

For the constraint on U 3 consider the following four inequalities: 

^ 0 

Y4 ^ 0 

y2 ^ 0 

74 ^ X3 

Rewrite them in terms of u and y 2 - 
-Uj - yj 2 0 

yj - Uj 2 0 
Y2 ^ 0 

y 2 - U 3 < X 3 
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Since > -Xj, we conclude from the first inequality that ^ Xp and 
from the other three we then have 

4'. Y2 e [max(0,U3), ’’^^'’(XpX^+Ug)] 

It is easily seen that for the interval in 4' to be nonempty, U3 must be 
between -x^ and Xp' 

5 . u^ € [“X^jXj] 

The constraints 1' -5' in u and y^ are equivalent to the constraints 1-5 
in y, but the dimension of the control space has been reduced by one, and 
the state transition equation has been simplified. 

The inequality constraints present a major hurdle in solving the 
problem; two other difficult areas are defining the information I^(x) 
available to the market, and the statistics of v(t). No reliable es- 
timates of the statistics of v(t) are available, and the principal 
ingenuity of the ECON formulation, although not entirely successful, was 
to circumvent the need for such statistics. We will see, upon careful 
derivation of the ECON approach, that there is actually no way around 
this problem. Reasonable assumptions must be made, and stated explicitly 
for scrutiny. 

l^(x) presents problems because the information actually available 
to the market, a history of controls and state, observations, has ar- 
bitrarily large dimension. From separation [2] we know that this in- 
formation can be reduced without loss of optimality to a probability 
distribution p^(x(t)|I^), but we still must choose a consumption law which 
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is a function of a probability function. We will follow ECON in defining 
I^(x) to be the "best" estimate of x(t) given past observation. Since 
x(t) is governed by time-varying linear equations, we know that the best 
estimate is a Kalman filter estimate which we shall denote x(t). Because 
certainty equivalence does not hold (due to the inequality constraints), 
the market cannot act optimally given only x(t). But it is a reasonable 
assumption if we must keep the closed-loop state dimension to a minimum. 

In fact, ECON took only x(t) to be the state, and this is technically 
not a "state", since the statistics of x(t+l) cannot be determined from 
x(t) alone. Actually the state-space must be extended to (X,P) for the 
state quality to remain, where P is the covariance matrix of x. We will 
discuss this issue in more detail a little later. 

To actually turn the ECON Model into a living and breathing economic 
organism, we must postulate the mechanism by which y(t) is chosen, i.e. , how 
the consumer, producer and exporter actually behave. It is at this point 
where economics per se enters, and we must hope that the economic assump- 
tions are strong enough to withstand the additional battering of approxi- 
mations in solving the stochastic control problem. 

Let us summarize the economic assumptions which directly affect the 
problem formulation. It is assumed that y(t) is a function of 5<(t) and t 
which optimizes some properly defined overall welfare measure. That is, 
consumers, producers and exporters are "optimal controllers" of overall 


welfare. 
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Let F(t, x(t), y(t)) be a measure of overall welfare at time t. Then 
participants behave to maximize their overall future welfare, discounted 
by a factor p each period of time. That is, y(x,t) is chosen to 
maximize: 


WP(x(t)) = E[ I x(f), y(t‘))] 

t'=t 

which we call the "discounted future welfare." This quantity depends on 
the starting state x(t). 

We will defer questions of uniqueness of existence of a solution of 
the above problem to the finite-state formulation, where the results are 
quite clear. First let us define our generalized model, and reexamine 
the state-space representation problem. 

2.3 Generalized Model 

In the interest of future research, we have generalized the ECON model 
to an arbitrary number of countries, with arbitrary planting times, harvest 
times, and fractional harvests. The general program has not been sub- 
stantially harder to write, but it has enabled us to observe the behavior 
of the algorithm on smaller and more tractable models. Also, it should 
allow future researchers to examine larger models without any reprogramming. 
A schematic of the physical model is shown below. 
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We follow the same convention that Type 1 grain is growing but not yet 
harvested, Type 2 grain is harvested but not yet consumed. Notice that the 
definition of x^,X 2 is switched around from the ECON model. Circles in 
the above diagram are locations of Type 1 grain; we call them aggregate 
crops . The number of aggregated crops is arbitrary. In the ECON model there 
are only two aggregated crops: in the US and in ROW. Squares in the 

above diagram are locations of Type 2 grain; we call them aggregated bins. 

In this model, there is a state variable x^ for each circle and square. 
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representing the amount of Type 1 or Type 2 grain at that location. Or- 
ganization into countries is arbitrary. 

The dynamics are analogous to the ECON model, except for harvesting, 
which is more general here. Planting takes place during restricted 
periods of the year, and the amount of planted grain u^. is simply added to 
the amount already existing x^-, with random variation v^-(t) due to weather 
and other uncertainties: 

x^.(t+l) = x.(t) + u-(t) + v-(t) ie aggregated crop 

t a planting season of i 

At the end of the harvest season we simply put: 

x^(t+l) =0 i e aggregated crop 

t post-harvest season 

Grain is harvested over a sequence of periods, and the fraction of 
the total crop harvested at period t is defined as hfr(t). Let j be an 
aggregated bin, i be an aggregated crop feeding j, and u. be the net 
result during period t of all consumption, imports and exports, analogously 
to our previous definition. Then 

Xj(t+1) = Xj(t) + Uj(t) + hfr^.(t) X x^.(t) 

During non-harvest season, hfr^. = 0. The constraints on the controls u 
are somewhat complicated and depend on the import-export assumptions, but 
the inequality constraints will have similar form to constraints 1' -5* 
in Section 2.1 

The state dynamic equation is, with suitable choice of M(t), 
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x(t+l) = M(t)x(t) + u(t) + v(t) 

2.4 Stochastic Aspects of the Model 

Up to this point we have avoided precise formulation of the pro- 
babilistic aspect of the model, because there are many problems and it 
is best to bring the discussion of them together in one section. It is 
now time to put the stochastic problem on a firm footing. 

Controls, representing the behavior of consumer, producers, and 
exporters, are chosen according to a noisy state observation. We will 
assume that the information set I(t) = (z(0), u(0), z(l),..., u(t-l), z(t)) 
is available where 

z(t) = x(t) + w(t) 

and w(t) are zero-mean independent Gaussian random variables. If v(t) is 
Gaussian also, then the optimal estimate of x(t) given I(t) is just the 
Kalman estimate of x(t), which we call x(t|t). 

We make the additional assumption that 

u(t) = g(t, x(tlt)) 

Although x(t|t) is an optimal estimate, x(tlt) may not contain enough 
information about the probability distribution of x(t) given I(t) to 
make an optimal choice of control. Nevertheless the problem quickly 
becomes intractable if we allow higher moments. 

We can now write the stochastic equations of the economic system: 

1. x(t+l) = M(t)x(t) + u(t) + v(t) 
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2. z(t) = x(t) + w(t) 

3. I(t) = (z(0), u(0), z(l), u(t-l), z(t)) 

4. X(tlt) = E(x(t)|l(t)) 

5. u(t) = g(t, x(t|t)) 

6. E{v(t) v^(t)} = Q 

7. E{w(t) w^(t)} = R 

Under the assumption that v and w are independent and Gaussian (3) and (4) 
can be replaced with the appropriate Kalman filter equations: 

3'. v(t+l) = z(t+l) - x(t+l|t) = z(t+l) - M(t)x(t|t) + u(t) 

4'. x(t+l|t+l) = M(t)x(t|t) + u(t) + K(t)v(t) 

where K(t) is the Kalman filter gain which depends on Q, R and t, but not 
on the observations z(t) or controls u(t). Let us write K(Q,R,t) to be 
more explicit. It turns out that v is independent from x(t|t) and is 
Gaussian with mean 0 and covariance K^^(Q,R,t). Thus equation 4' can 
be written 


4". x(t+l|t+l) = M(t)x(t|t) + u(t) + <i)(t) 

where (|)(t) = K(t)v(t), and 4" has the state property. In other words, it 
is not necessary to know the true state x(t) in addition to x(t|t) to 
determine the statistics of x(t+llt+l); x(t|t) alone will do. 

The problem now remains to determine the statistics of the random 
variable <J)(t). E{<})(t)} = 0 since E{v(t)} = 0, and E(<{)(tj)<|)‘ (t 2 )) = 0 if 

tj f Let K^^(t) = E((l)(t)(t)' (t)). As we mentioned earlier, K^^(t) 
depends only on R and Q, but we can obviate the need for Q if the state 
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error covariances are known: 

P(t) - cov{x(t) - x(t|t-l)} 

The standard Kalman filter equations then give: 

K^y(t) = P(t) + R(t) 

K(t) = P(t)[P(t+l) + R]“^ 


and thus 

K^^(t) = P(t)[P(t) + R]"^P(t) 

Thus the covariance of (p, can be determined either from the pair 

(Q,R) or from (R,P(t)), but not from P(t) alone , unless some additional 
assumptions are made. For example, if the wheat growing process is as- 
sumed to be of "random walk" nature, that is, the uncertainties are rep- 
resented as a sequence of random weather influences represented by zero 
means random variable e(T-i) as follows: 

x(TlT-i+l) = x(T|T-i) + i = l,...,12 

where x(T|T-i+l) is the estimate of x(T) at time T-i+1, the 

additional information available at time T-i+1 on the final yield x(T). 

can be seen as innovation sequence. 

In such a case K^^(t) can be inferred from the knowledge of P(t) and 
<!><p 

the statistics of e-j-.-j* 

We now see that the Kalman filter evolution can be logically separated 
from the real system since the noise term (J)(t) is white and independent of 
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X. Thus we can restrict our attention to the state equations (for simpli- 
city x(t) denotes x(t|t)): 

4". x(t+l) = M(t)x(t) + u(t) + <f)(t) 

5. u(t) = g(t,x(t)) 

The control u(t) = g(t,x(t)) is chosen to maximize the discounted future 
wel fare 


l/P(x(t)) = E I x(f), y(f)) 

t'=t 

Since x(t) is not known to the economy, it is logical to replace x(t) 
with x(t) : 

8. WP(x(t)) = E I x(t'), y(f)) 

t'=t 

Equations 4", 5 and 8 constitute a stochastic control model. In 
Sections 2.5 and 3 we will discuss several mathematical methods for 
solving these problems and finding optimal controls. 

2.5 Approximation by Discrete States 

For solution on the computer, it is necessary to approximate the 
model by discrete states. The method by which this approximation is made 
is very important and affects the results obtained, but discrete models 
are very well understood, good algorithms exist for solving them, and the 
questions of existence and uniqueness of their solutions answered easily. 
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Suppose that the set of possible states x(t) is finite. For the 
agricultural model, M(t) is periodic, and g is also periodic in t, so 
it is logical to choose the discrete sets to be periodic also, since 
the steady state solution will be periodic. 

Let s be the total number of discrete states through one entire cycle 
and let a state be uniquely specified by an index j, 1^ j<s. We must 
choose a matrix {P. .} which accurately reflects the probability that 

• h ’^2 

(t+1) = X. if x(t) = X. . 

J2 Jl 

The continuous probability distribution of x(t+l) given x(t) = x. 
is the same as that of M(t)x. + (|)(t) +g(t,x- ) where M(t)x- + g(t,x- ) 

Jl • Jj Jj 

is an additive constant and (J)(t) is a zero-mean Gaussian random variable 
with covariance matrix K^^(t). The problem is to divide the area under the 
p.d. curve for x(t+l) between the possible x. . Those states x. which 
are defined at a different time period in the cycle receive no probability. 
For those defined at t, we might use an integration between midpoints as 
illustrated below. We will discuss methods in more detail in the section 
on Algorithms. 
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Whatever the approximation scheme, the resulting probability transition 
matrix P depends on the feedback control g. 

For any choice of g, P(g) will be a cyclic transition matrix . For 
these type of matrices there exist for any x. a limiting average state 

J 

probability TT.(g), defined as 

J 

1 T ^ 

TT-j = Tim ^ I Pr(x(t) = x.) 

^ t=0 J 

In [7], a more general concept is defined which is directly appli- 
cable to discounted Markov programming problems: 


00 

i = I P^Pr(x(t) = x. |x(0) = x. ) 
J1J2 ^2 ^1 


for if we define a discrete version of W^ to be 


I Pr(x(t) = x. lx(0) = x. )xF(t, X. , g(t, x. )) 

•^1 t=0 jyi ^2 ^1 ^2 

then (in matrix notation): 

wP = n^k 

where k=(F(l, Xp g(l, x^)), F(l, X 2 , G(l, X 2 )),..., F(nper, x^, g(nper, x^))' 
Thus the problem is to choose g to maximize simultaneously all of the 
elements of the vector: 


nP(g)k(g) = wP(g) 



18 


It can be shown that if {P(g)> is compact, which will be assumed in this 
study, then such an optimum exists. Thus by formulating the problem as a 
discounted Markov program, the question of existence works out automa- 
tically. Powerful Markov programming techniques can be applied, and that 
is the subject of the next section. 
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. 3. ALGORITHMS 

Before we explain the details of the computer algorithm implemented, 
let us review briefly the constraints of the problem and their relation to 
different algorithms in the control literature. 

3.1 LQG (Linear-Quadratic-Gaussian) Method 

To apply the classical results of LQG theory, a cost function must be 
defined. From our previous discussion, this cost function would have the 
following form: 


I [y(t)' A(t) y(t) + y(t)' B(t)]. 
t=0 

The above cost function does not depend on x(t); therefore if the LQG 
were applied directly to the problem, ignoring the inequality constraints, 
then the solution would trivially be 

maxy(t)’ A(t) y(t) +y(t)' B(t); 
y(t) 

the inequality constraints are the essence of the problem. 

Penalty and barrier methods can be applied to handle inequality con- 
straints in general programs, but with the LQG method we are constrained to 
use quadratic costs, which cannot approximate inequality constraints. Another 
method to handle the inequality constraints is to normalize the controls about 
a nominal trajectory; then the question is how to find a "nominal" trajectory. 
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3.2 Dynamic Programming 

In a dynamic programming algorithm, the basic idea is to find an optimal 
value function V^(x) for each state x, for then the optimal controls satisfy 

y(x,t) maximizes F(t,y) + E[pV^^^{M(t)x(t) + N(t)y(t) + $(t)}]. 

The approach taken by ECON was to choose Monte-Carlo outcome of <I>(t), thus 
eliminating the expectation operator and perform a deterministic maximization. 
This is completely invalid, as it is equivalent to assuming that consumers and 
producers can make their estimates based on a forecast not yet known, one time 
step ahead. Alternatively the operations of maximization and expectation are 
commuted, which, in general, is invalid. A more valid way to remove the 
expectation operator would be to move it inside of V, thus replacing $(t) 
with its expected value, 0: 

F(t,y) + pV^'^kM(t)x(t) + N(t)y(t)]. 

The other drawback associated with the dynamic programming approach 
is the predetermined quadratic assumption on the optimum welfare V^(x). 

The choice of quadratic, rather than other types of nonlinear functions, 
for V^(x) is dictated by the necessity of keeping the problem computa- 
tionally tractable. However, in many cases such a predetermined behavior 
leads to wrong result. For instance, if the incremental value function 
F(t,y) is independent of the state (which means that no storage cost is 
taken into account), then the optimal policy y* will not depend, in general, 
on the value of the state (the value of the state may influence y through 
constraint, if constraints on y depend on the state). Similarly the cost 
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lim V^(x) will become independent from the state variable x. This case 
will be further discussed in Section 4. 

3.3 Introduction to Markov Programming * 

The biggest drawback to Markov programming is that the function V 
must be stored; hence its argument (the state space) must be discretized 
(cf. Section 2.5), and for a problem of the dimensionality of the wheat 
model, this can lead to serious storage problems for even coarse discre- 
tization (Bellman's "curse of dimensionality"). 

A coarse discretization leads to two fairly serious problems. The 
first is knowing where to choose the discrete states; a bad initial choice 
will give a meaningless answer. In fact, it may even give an answer that 
would lead one to requantize the state space in the wrong direction. Thus 
one must be careful in choosing the initial scheme. The second problem is 
dealing with the endpoints. Consider approximating the value of x(t+l) 
where x(t+l) has Gaussian distribution shown, and V^(x) is known at the 
discrete points. On the basis of the known values of V^^^(x), it is not 
possible to accurately estimate E[V^^^(x(t+l))]. One would like to rule 
out such possibilities, but doing so is in effect adding new inequality con- 
straints, In fact, just about any way one would care to define an expected 
value for the above problem will lead to strange effects near the endpoints 
of the approximation scheme. We have found, in our initial tests, that these 
effects can be so strong as to force the controls y(t) to be chosen to always 
place the distribution on an endpoint! The solution of this problem is to 
carefully choose the discretization scheme so that the optimal solution will 
not be close to the endpoints. 


*For details of Markov programming, see Appendices A and B. 
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Despite these problems, there are many advantages to Markov program- 
ming. First, very speedy and efficient methods exist for solving them. 
Moreover, any distribution, rather than strictly Gaussian, may be used 
for the noise variable. Besides, monotonic bounds on the optimum values 
are available at each iteration and it is easy to prove that a solution 
to a Markov programming is indeed the current solution. With these 
advantages and drawbacks clearly in mind, let us now give a more formal 
description of the Markov programming method. 

Recall our definition in Section 2.5 of P(g) the probability transition 


matrix between discrete states, dependent on the feedback control g, and 

r 0 P (g)'J 

k(g) the incremental value function. P(g) is cyclic of the form of — ‘ 

Define C(g) and J^(g) to be the unique vector solutions of the following 


matrix equation: 


p[P(g) - I] C(g) + k(g) = 


( 1 ) 


Existence and uniqueness are guaranteed for p<l [7], and C(g) rep- 
resents the discounted objective value vector, within a constant of V^(x), 
and C(g) is the discounted analogy of Varaiya's [6] "dual variable." It 
also turns out that jP = wP(g). The basic idea behind Markov programming 
is that the optimal feedback control g* must maximize the following expression: 


g* = arg max k(g) + pP(g)C(g) 


( 2 ) 


One simply begins with a "naive" C(g) (any initial given value will 
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guarantee convergence), call it Cq, finds the gj^ which maximizes equation (2), 
and then updates Cg to Cj^ = C(g^) in the following way. Rewrite equation (1) 
like this: 

[pP(g) - I]C(g) + k(g) = JP+(1- )C(g) 

One of the fundamental results in [7] is that J^+ (l-p)C(g) is a vector 
with equal entries, call it J^ = al Where 1 = (11. .. 1) ' . Thus 

pP(g)C(g) + k(g) = al + C(g). 

Since the additive constant al is irrelevant, set = k(gj) + pP(gj)Cg= 

«1 + C(gj^). Notice that the left side of equation (3) is the very expres- 
sion that was maximized in equation (2), therefore simplifying the compu- 
tation further. 

Varaiya's "dual method" is very close to this and is easily explained 
with this background. Instead of setting = k(g) + pP(g)Cg, he sets 
Cj = SCk+pPCg] + (l-S)Cg. This guarantees convergence under slightly more 
general conditions, which are needed for our problem, involving a cyclic 
transition matrix. It is easy to see, however, that the convergence of 
Cg, Cp ...» C* is slower for Varaiya's algorithm. 

Nevertheless we can use Variaya's idea to actually speed up the 
standard Markov programming algorithm. Recall that is only an approxi- 
mation to C(gj^). In fact if a better estimate of C(gj^) were available, 
the following maximization of would be closer to the optimal g*. Since 
a simple computation of k(gj) + pP(g^)C takes little time compared to the 
time required to find a maximization, it might be wise to compute, between 
optimizations, a finite sequence 
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C2 = (k)gj) + pP(gi)Ci)s + (l-B)Ci 

C3 = (k{gj) + pP(g^)C2)B + {l-e)C^ 

= (k(gi) + pP(gi)c„.i)e + 

Varaiya proved that C^->C(gj); thus for a little effort here we can 
get the most benefit out of the next maximization operation. Exactly 
what the tradeoff is we are not sure, that is, how large n should be. 

But our results show that more accurately calculating C(gj) speeds up 
convergence considerably, especially in the final stages of convergence. 

Finally, and most importantly, the sequence W^Cg^), W^(g2),... 
will converge to the optimal welfare; gp g2»*-- "converge"* to an 
optimal control , and furthermore 

WP'(g*) E lim (C,-+i-C.) + (l-p)C. = lim (C.^^-pC-). 

This constitutes the theoretical basis for Markov progranming. 


*If the optimum control g* is not unique, oscillation is possible. In 
practice, computer algorithms usually "prefer" one optimum over another, 
so the sequence converges. 
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3.4 Structure of the Program 

In our program the successive transition probability matrices P(g.-j), 
P(92)’‘’" stored as they are prohibitively large, bur rows are 

computed as needed. What is stored is: 

1. Details of the Economic model (number of countries, 
aggregated crops, planting times, etc.). 

2. Details of the discretization scheme (how many discrete 
levels for each variable and what the levels are). 

3. Statistics on means, variances, etc. of state variables 
at each time period. 

4. Vectors c^. , the approximate value function, a dual vairable, g^. , 
the suboptimal control, , an approximation probability distri- 
bution (used to calculate 3) and two vectors the size of c^. and 
a^. which are used as work space. 

5. Algebraic workspace, and the program itself. 

The program works as follows. An initialization program sets up a 
file with data 1, 2, 3 and 4, although only 1 and 2 affect the subsequent 
operation of the main program. Changes in Type 1 data require some redimen- 
sioning of matrices in the main program, but otherwise no changes in the 
main program are necessary. The main program takes the file with data 1, 2, 
3 and 4, iterates, and when it has converged, writes the new values at 1 , 2 
3 and 4 onto another file. A third program may be written to examine the 
output file, which contains the optimal solution, more closely. 

In the iteration Type 5 data changes most rapidly, followed by 4, 3 and 
2. Type 1 does not change. After the approximate controls and probability 
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distribution a^. have converged (data Type 4), then 3 is updated. If the 
statistics are at too great variance from the discretization scheme (data 
Type 2), then data 2 is updated. This is the basic sequence of events. 

The names of the various routines are as follows. INITIAL is the 
initializing program; it calls only one subroutine, K, which computes the 
incremental value function F^(x; g(t,x)) for any discrete state Xj, time t 



PROW 
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MAIN is the main program and calls three subroutines PMODEL, OPTIMIZE, 
UPDATE. The data file is read into a blank common area shared by all 
subroutines, except that scratch space is in a common area called SCR. 
PMODEL prints information about the economic model. OPTIMIZE computes 


9i+l = "’ax arg [k(g.) + pP(g.)c.] 


by first computing bounds on the admissible control (UBOUND) and then 
searching through the controls (ITERATE) with first a coarse approximate 
search and then a detailed accurate search. The highest values are stacked 
(STACK) for later inspection for multiple peaks. Then UPDATE is called 
to update c^. to c(g^.^^) = c.^^ by successive approximations, as we discussed 
above. MAIN then iterates OPTIMIZE and UPDATE until the solution converges 
and this is fairly fast. Then as we said, if the statistics are off from 
the discretization scheme by a significant amount, MAIN will call a program 
ORIENT to redefine the discrete states appropriately. Then FiAIN writes 
out the answer to a disk file (see illustration). 

The subroutine PROW acts as a "virtual matrix" P(g) and computes, given 
a g and t, a specified row of the matrix P. 
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4, A SIMPLIFIED MODEL: ONE COUNTRY - TVJO PERIODS 


In order to get some insight into the Markov programming approach 
and in particular into the computational problems which are involved, we 
have decided to consider the case of one country - two period model. The 
model that we are going to study is the one considered by ECON in "Eco- 
nomic Benefits of Improved Information on Worldwide Crop Production and 
Distribution with Application to Wheat, Corn and Soybeans" (contract No. 
NASW-2558 - p. 27-58) [8]. The data which will be used are essentially 
the same as in the above ECON model. 



Based on [8], the model for one country - two periods is the following: 
The year is divided into two types of period, type 1 and 2, as depicted 
in the following diagram. 


planting accomplished 



production appears 



: type 1 period 
: type 2 period 
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The state variables Xj(t) and X 2 (t) are defined as: 

Xj(t) - Mean value of total stock at time t (inventory at time t) 

XpCt) - Mean value of total quantity of growing crop (planted, 
but unharvested) 

Now at times t= 1,3,5,..., that is, at the beginning of type 1 
periods, we have: 

' Xj(t+1) = Xj(t) - y^(t) + Vj(t) 

X2(t+1) = y2(t) + V2(t) 

where yj(t) is the consumption 
y 2 (t) is the planted crop 

The second equation states simply that the unharvested period 
at period 2 equates to the planting done at period 1. 

v^(t) and V 2 (t) are stochastic terms translating uncertainties 

on inventories and production yields. They are assumed to be 
zero mean Gaussian. 

At time t = 2,4,6,. .. , that is, at the beginning of type 2 periods, 
the state equation is: 

x^(t+l) = Xj(t) + X 2 (t) - y^(t) + Vj^(t) 

X2(t+1) = V2(t) 

The second equation states simply that at the beginning of type 1 period 
(time t+1), there are no crops in the ground beside a noise term V 2 (t). 

It is possible, and desirable for computational reasons, to reduce 
the order of the system by retaining the mean value of the inventory x^(t). 
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as the sole state variable. Then: 
t ~ Xy355).*. 

x(t+l) = x(t) - y^{t) + (|)(t) 

At time t = 2,4,6,. : 

x(t+l) = x(t) - yj(t) + y 2 (t) + (})(t) 

The choice of decision variable yj(t) and y 2 (t) is constrained to: 

'O < yj(t) :s.x(t) (consumption < available stock) 

0 < yj(t) (positive production) 

4.2 Quality of Information 

In the context of the above model, the quality of information is 
directly related to the statistic of (J>(t) (or equivalently v^(t), V 2 (t)). 
4>(t) is assumed to be zero mean Gaussian; however, for the Markov pro- 
gramming approach the Gaussian assumption can be relaxed. In fact, any 
other distribution for (|)(t) can be considered. 

There are various information gathering schemes both on the level of 
inventories and on the future production. The issue is to evaluate the 
gain of the community (in terms of its welfare function) vis-a-vis an 
improvement in the information gathering scheme. In case of the Gaussian 
assumption, an improvement of information translates into a decrease in 
the noise variance. 
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In the case of the present example, the Gaussian random variable 
<J)(t) represents the inventory uncertainty at time t= 1,3,5 (period of 
type 1) (4>(t) = Vj(t)) with: 

E[cf)^(t)] = oj (t= 1,3,5,...) 

But at time t = 2,4,6,..., (f)(t) represents the sum of uncertainties on the 
inventory level and on the production. These two latter uncertainties 
being independent: 

<i>(t) = Vj(t) + V2(t) t = 2,4,... 

E[<(>^(t)] = a^ + 0^ t = 2,4,... 

Hence the variance at period of type 2 is always greater than the variance 
at period of type 1. 

4.3 Incremental Value Function 

The incremental value function is given by: 

’F(yj,y 2 ) = a^yj^ + b^y^ at t = 1,3,5,... 

i 2 9 

F(yi.y2) = a^y^ + b^y^ + + b2y2 t = 2,4,6,... 

2 

where the expression a^y^ + b^y^ is referred to as consumer welfare and 
{-(a 2 y 2 + production cost. 

The purpose of the optimal stationary control is to maximize: 

T . 

I E{pV(y (x),y 2 (x))} 
i=0 


W = lim _1_ 
T+1 



32 


where p is a given discount rate. 

Note that the incremental value function is independent from the 
state x(t) of the system, and hence from the noise term (j)(t), since this 
latter enters the system through the state x(t). If there were no state 
dependent constraints on the input y(t), one would clearly deduce that 
the optimum stationary control is independent from both the state and the 
noise characteristics. However, since the constraints on the control y(t) 
are state dependent (y(t)sx(t)), theoretically the state of the system 
may affect the optimum control y(t) and the optimum welfare through the 
constraints. But since we are considering stationary optimum control, 
for the state to affect the optimum welfare function, it is necessary for 
the stationary optimum control y(t) to hit the constraints; that is, for 
stationary consumption to equate the available stock in at least one of 
the two periods. But in general this is not the case, or at least not a 
desirable case. If the consumption equates the available stock, one has 
to alter the incremental welfare function so that the corresponding opti- 
mum consumption/production policy would not deplete the available stock 
at any period. And in this latter case, the optimum control and welfare 
will be independent from the state and the noise. 

Notwithstanding the above discussion, it is apparent that a state 
independent incremental value function, such as the one used by ECON in 
the one country - two period example [8], is not appropriate to measure 
the benefit of improved information on the noise statistic, because of its 
general lack of sensitivity. To overcome this problem one has to make the 
incremental value function state dependent. A natural way of doing so is 
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to add a term representing the cost of storage, and hence depending on 
the state x(t). There are several possible choices for such a cost func- 
tion. We have chosen to represent the term referring to the storage by: 

+ a3(x - .8M.)(x - 1.2M^.)» i = 1,2 

where is the desired mean at periods of type 1 and of type 2 and a^ 
is a positive constant. The intuitive meaning of the above term is that 
the community would favor an inventory within 20% of a mean M^. for 
which the storage capacities are prepared. Any inventory values out- 
side the desired 20% range is disfavored. To be more precise, the in- 
cremental value function F is defined as: 

t = 1,3,5,... 

FEy^Ct), x(t)] = a^y^Et) + bjy^(y) + Efa^ExCt+l) - .8M2][x(t+l) - I.2M2]} 

Note that the last term refers to the estimate of the storage cost at time 
t+1, that is at period of type 2. M2 is the desired mean at period of 
type 2. (Note that M2 may be state dependent, that is, M2 at time t+1 
may depend on the value of the state x(t).) 

Carrying on the expectation operation, we deduce: 

F(y^(t), x(t)) = ajyj(t) + b^y^Ct) + a^ExCt) - yj(t) - .8M2] 

•Ex(t) - y^(t) - I.2M2] + a3E[(j)^(t)] 

t 1,3,5,... 

Similarly for type 2 period we have: 
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F(yi(t). y^{t), x(t)) = ajy^(t) + b^y^(t) + a 2 y 2 (t) + b 2 y 2 (t) 

+ a3[x(t) + y2(t)][x(t) + y2(t) - yi(t)] + a3E[cj)^(t)] 

t = 2,4,6,... 

Before ending this section let us note that the numerical evaluation 
of the optimum policy y* and the corresponding welfare value under the 
assumption that F has no state dependency, i.e. , 83 = 0 , confirms the 
previous theoretical claim, that is, both the optimum policy and the op- 
timum welfare remain insensitive to changes of noise variance. But, sur- 
prisingly, the numerical results of the ECON treatment of this example [ 8 ] 
infer that both the optimum policy y* and the optimum welfare change when 
the noise variances vary. We can explain this apparent paradox in the 
following way. In the dynamic programming used by ECON, at each itera- 
tion one maximizes: 

F(y^(t), y 2 (t)) + pE{V^+l(x{t+l))} 

where V^'‘’^(x(t+ 1 )) is the optimum value of the welfare function from 
time (t+1) to infinity. F is independent from the state x(t), but for 
computational tractability E{V^'^^[x(t+l)]} is made to be a quadratic 
functional in x(t) (see Section 3). This computational necessity changes 
the nature of the optimization problem and forces the optimal control y* 
and the optimal welfare to depend on the state x and hence on the process 
noise variance. Hence the numerical results of ECON concerning the pre- 
sent example do not correspond to any benefit, in terms of the specific 
welfare function involved. In fact, there is simply no benefit in 
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information- improvement if there is no dependency between the incre- 
mental value function and the state. 

4.4 Discretization - Probability Matrix P(y) 

We have used 5 to 13 discrete values to represent the state at 
period of type 1 and type 2. For the sake of representation, assume that 
we have only 5 discrete values for each type of period. 

Let us call Sj the mean value of the total inventory at period 1 and 
Vj its variance. Sj and are provided by past statistics (either by Kalman 
filtering or other time series analysis). 



There are various criteria for discretizing the state value around 
Sj. One may use "equal area" criterion, that is, the area under the p.d 
curve between two consecutive discrete values is the same. Or we can 
use a simpler "midpoint" scheme which consists of taking the sequence of 

3v, V, 

Sj-2Vj, Sj- 2 » ^1~”2~* ^2^>***» 
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as discrete value. For the present example this last discretization 
scheme has been retained. Note that there is no restriction as to the 
type of discretization to be used. Similarly for the second type of 
period, the state is discretized around Sg with variance V 2 * 

Let us call the discrete value of the state at period of type 1 

and $ 2 ^ the discrete value of period of type 2. Then the probability 
distribution matrix, corresponding to the case of i = 5, is cyclic and 
given by: 


p (y^^ »y2 ) 


with 


P^.j = p[x(t) = S2j|x(t-1) = Sj^yi,y 2 ]; i^7. j>8 
P^-j = p[x(t) = Sjj|x(t-1) = S 2 ^.yi»y 2 ]i j<7 

Note that for is 7, corresponding to the first period, the production 
equals zero, hence: 




P^-j = P[x(t) - S2jlx(t-1) = Sj.,y^] for i <7 



To evaluate P^.j, assume that at time t-1, the state is in the con- 
sumption being y^(t-l), the probability distribution for the state x(t) 
is given by: 


f{x(t)|x(t-l) - Sj^., y^(t-l)} 


1 [ [x(t)-($i)-y;)]^' 

72,'eu"(f-i) '’"’"I 2E[/(t-i)] 


where E[c})(t-1)] = variance of the first period. This probability distribu- 
tion is shown in Fig. 



The probability p[x(t) = S2jlx(t-1) = .y^Ct-l)] is then computed as 

s . +s • s .+s 

the area between — — and under the probability distribution 

f{x(t) |x(t-l) = S^^.,yj(t-1)). That is, 




1 

/2TTE((|)nt-i)) 


(T-(S,,-yi(t-l)))^ 

2 exp dT 

2 
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The probability corresponding to the endpoints, say to $22 and $25, 

^ 21^^22 

are computed as the area between and 5 for P.., and between 


^ 25‘^^24 


il 


and +» for P-c- 

ID 


Similarly the probability P.. for is8 is computed; the only difference 

* J 

is that the mean is replaced by (S22-y2(t-l)+y2(t-l)) and the variance 


corresponds to the second type period noise. 

At this point the essential elements of the Markov programming al- 
gorithm of Section 3, that is, the value function F and the transition 
matrix P(y2,y2) have been determined. 


4.5 Data 

Incremental value function: 
a^ = -2. 

bj = 840. 



P = .971 

Mean and variance of inventories at period 1 and 2, used for discretization: 

M, = 391.3 (millions Vj = 38.8 
of metric 

M2 = 217.1 


V2 = 43. 



39 


4.6 Numerical Results 

Tables la - Ic contain optimal production/consumption policy under 
three different information schemes for the case of 5 discrete values per 
period. 


Period 1, E[(f.^(t)] = 784 

Period 2, E[<|)^(t)] = 1764 

State 

Consumption 

Production 

State 

Consumption 

Production 

313.7 

152.7 

0 

131. 

131. 

372.1 

352.5 

163.2 

0 

174.1 

174.1 

372.1 

391.3 

175.7 

0 

217.1 

176. 

345.2 

430.1 

186.7 

0 

260.2 

181.2 

319. 

468.9 

198.5 

0 

303.2 

186.6 

292.1 


Table la 


Period 1, E[<{»^(t)] = 196 

Period 2, E[<j)^(t)] = 1764 

State 

Consumption 

Production 

State 

Consumption 

Production 

313.7 

148.2 

0 

131. 

131. 

372.1 

352.5 

160. 

0 

174.1 

174.1 

372.1 

391.3 

179.9 

0 

217.1 

176.1 

345.2 

430.1 

183.8 

0 

260.2 

181.2 

319. 

468.9 

197.4 

0 

303.2 

186.7 

292.1 


Table lb 
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p 

Period 1, E[(f) (t)] = 441 

Period 2, E[<J>^(t)] = 784 

State 

Consumption 

Production 

State 

Consumption. 

Production 

313.7 

148.4 

0 

131. 

131. 

378.1 

352.5 

163.2 

0 

174.1 

170. 

375.2 

391.3 

176.1 

0 

217.1 

175.4 

348. 

430.1 

186. 

0 

260.2 

180.9 

320.6 

468.9 

197.5 

0 

303.2 

186.4 

293.2 


Table Ic 


The transition probability matrices corresponding 

to la through 

Ic are: 

0.3805E+00 

0.5108E+00 

0. 1059E+00 

0.2788E-02 

0.8225E-05 ' 



0.9464E-01 

0.4943E+00 

0.3721E+00 

0.3853E-01 

0.4838E-03 



0.1211E-01 

0.2248E+00 

0.5573E+00 

0.1966E+00 

0.9167E-02 

■ Pij’ 

i<7 

0.5877E-03 

0.4330E-01 

0.3887E+00 

0.4817E+00 

0.8570E-01 


js8 

0.1278E-04 

0.3755E-02 

0. 1245E+00 

0.5282E+00 

0.3436E+00 ^ 



0. 1766E+00 

0.3215E+00 

0.3229E+00 

0.1464E+00 

0.3267E-01 ’ 



0. 1766E+00 

0.3215E+00 

0.3229E+00 

0. 1464E+00 

0.3267E-01 



0.1024E+00 

0.2630E+00 

0.3536E+00 

0.2147E+00 

0.6636E-01 , 

■ Pij’ 

i>8 

0.6135E-01 

0.2064E+00 

0.3518E+00 

0.2708E+00 

0.1097E+00 

j^7 

0.3588E-01 

0. 1544E+00 

0.3284E+00 

la 

0.3154E+00 

0.1658E+00 j 



0.1769E+00 

0.8072E+00 

0.1586E-01 

0.8811E-07 

0.5551E-16 ' 



0.2134E-02 

0.5839E+00 

0.4134E+00 

0.4966E-03 

0.9612E-10 



0.1305E-04 

0. 1292E+00 

0.8449E+00 

0.2589E-01 

0.2585E-06 

^ Pij’ 

i^7 

0.1058E-10 

0.1457E-03 

0.2918E+00 

0.7023E+00 

0.5748E-02 


j^8 

0.9613E-17 

0.2924E-07 

0.9428E-02 

0.7568E+00 

0.2337E+00 



0.1815E+00 

0.3241E+00 

0.3202E+00 

0. 1428E+00 

0.3133E-01 ' 



0.1815E+00 

0.3241E+00 

0.3202E+00 

0. 1428E+00 

0.3133E-01 



0.1049E+00 

0.2656E+00 

0.3530E+00 

0.2119E+00 

0.6426E-01 

^ Pij’ 

i>8 

0.6290E-01 

0. 2090E+00 

0.3524E+00 

0.2683E+00 

0.1074E+00 

j^7 

0.3695E-01 

0. 1570E+00 

0.3301E+00 

0.3134E+00 

0.1625E+00 ^ 




lb 
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0.2715E+00 

0.6538E+00 

0.7443E-01 

0.2398E-03 

0.1497E-07 ^ 



0.3993E-01 

0.5774E+00 

0.3723E+00 

0.9420E-02 

0.5450E-05 



0.1420E-02 

0.1736E+00 

0.6927E+00 

0.1316E+00 

0.7744E-03 

■ Pi 

i.7 

0.6470E-05 

0.1041E-01 

0.3866E+00 

0.5662E+00 

0.3682E-01 


J 

0.7435E-08 

0.1513E-03 

0.5887E-01 

0.6278E+00 

0.3131E+00 ^ 



0.5385E-01 

0.3579E+00 

0.4657E+00 

0.1171E+00 

0.5410E-02 ' 



0.4951E-01 

0.3464E+00 

0.4731E+00 

0.1249E+00 

0.6079E-02 



0.2171E-01 

0.2413E+00 

0.5108E+00 

0.2098E+00 

0.1628E-01 

■ Pi, 

i^8 

0.8536E-02 

0. 1503E+00 

0.4916E+00 

0.3114E+00 

0.3820E-01 


J 

j^7 

0.2982E-02 

0.8329E-01 

0.4224E+00 

0.4117E+00 

0.7966E-01 




Ic 

Finally, if we take the case la as the base, the gain of welfare due 
to the improvement of information (decrease in noise variances) corres- 
ponding to cases lb and Ic are: 
lb 65 ($ million) 

Ic 104 ($ million) 

Note that the improvement in case lb pertains only to the variance 
of the inventory (decrease of noise variance in the first period from 784 
to 196). In case Ic both variances of period 1 and period 2 decrease. 

The above figures are obtained under a rather coarse discretization 
(5 discrete values per period). For more accurate values one must con- 
sider a finer grid and also rediscretize the state space around the optimal 
value obtained with the coarse discretization. In this example, taking 
9 discrete values per period leads to the following optimal consumption/ 
production scheme: 



42 


Period 1, E[^^(t)! 

] =. 784 

Period 2, E[(f)^(t)]= 1764 

State 

Consumption 

Production 

State 

Consumption 

Production 

313.7 

154.5 

0 

131. 

131. 

370.3 

333.1 

158. 5 

0 

152.5 

152.5 

370.3 

352.5 

164.2 

0 

174.1 

174.1 

370.3 

371.9 

170.2 

0 

195.6 

173.7 

356.6 

391.3 

176. 

0 

217.1 

176.2 

344.1 

410.7 

181.5 

0 

238.6 

178.8 

331.2 

430.1 

187. 

0 

260.2 

181.4 

318.1 

449.5 

192.7 

0 

281.7 

184. 

304.8 

468.9 

198.9 

0 

303.2 

186.7 

291.4 


Period 1, E[$ (t)]=441 Period 2, E[(J) (t)]=784 


State 

Consumption 

Production 

State 

Consumption 

Production 

313.7 

150.7 

0 

131. 

131. 

377.2 

333.1 

157.2 

0 

152.5 

152.5 

377.2 

352.5 

164. 

0 

174.1 

170.1 

374.4 

371.9 

170.2 

0 

195.6 

172.8 

361. 

391.3 

175.8 

0 

217.1 

175.5 

347.5 

410.7 

181.1 

0 

238.6 

178.2 

333.9 

430.1 

186.4 

0 

260.2 

180.9 

320.3 

449.5 

191.8 

0 

281.7 

183.7 

306.6 

468.9 

197.8 

0 

303.2 

186.4 

293, 


Ic 
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In this latter case the gain of the information improvement amounts 
to $91 million instead of the $105 mill ion computed previously. Since we 
are using a finer grid (9 discrete values instead of 5), the $91 million 
figure is more accurate than the $105 million one. Moreover, taking a 
grid of 11 and 13 discrete values for each period results in a gain cor- 
responding to an improvement of information from la to Ic equal in both 
cases to $91 million. Therefore, the grid of 9 discrete values is a 
sufficient approximation. In the following tables the optimal consumption/ 
production policies corresponding to a grid of 11 discrete values are 
shown. 


Period 1, E[<()^(t)] = 784 

Period 2, E[(f)^(t)] = 1764 

State 

Consumption 

Production 

State 

■ 

Consumption 

Production 

313.7 

154.7 

0 

131. 

131. 

370.1 

329.2 

157.6 

0 

148.2 

148.2 

370.1 

344.7 

162. 

0 

165.4 

165.4 

370.1 

360.3 

166.7 

0 

182.7 

172.3 

363.7 

375.8 

171.5 

0 

199.9 

174.2 

354.0 

391.3 

176. 

0 

217.1 

176.2 

343.9 

406.8 

180.4 

0 

234.3 

178.3 

333.7 

422.3 

184.8 

0 

251.5 

180.4 

323.2 

437.9 

189.2 

0 

268.8 

182.5 

312.7 

453.4 

193.9 

0 

206.0 

184.6 

302.0 

468.9 

199.0 

0 

303.2 

186.7 

291.3 


la 
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Period 1, E[(J)^(t)] = 441 

Period 2, E[(|>^( 

t)] = 784 

State 

Consumption 

Production 

State 

Consumption 

Production 

313.7 

151. 

0 

131. 

131. 

377.1 

329.2 

156. 

0 

148.2 

148.2 

377.1 

344.7 

161.5 

0 

165.4 

165.4 

377.1 

360.3 

166.7 

0 

182.7 

171.2 

369. 

375.8 

171.4 

0 

199.9 

173.4 

358.2 

391.3 

175.8 

0 

217.2 

175.5 

347.4 

406.8 

180.1 

0 

234.3 

177.7 

336.6 

422.3 

184.3 

0 

251.5 

179.9 

325.7 

437.9 

188.6 

0 

268.8 

182.0 

314.8 

453.4 

193. 

0 

286.0 

184.2 

303.9 

468.9 

197.8 

0 

303.2 

186.4 

292.9 


Ic 


4.7 Remarks 

Some experience has been gained in solving the example of one country - 
two period models. We feel that some of the practical problems encountered 
in this simple case will be present in the more complex setting of multi- 
country - multi-period problems. These are the following: 

a) In applying the Markov programming algorithm of Section 3, at 
each step one has to find the optimum y* such that: 

y* = arg max {F(y,x) + pP(y)C} 

y 
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It is important that the subroutine finding the maximand at each period 
be as accurate as possible (Extended Precision). This will speed the 
convergence and allow more flexibility for the choice of initial "dual 
vector" C. 

b) Theoretically, any initial choice Cq of the vector C insures the 
convergence. But from a practical point of view a bad initial choice 
vnll cause the convergence to be very slow. Notwithstanding the order of 
the vector and matrices involved, which increase rapidly with finer dis- 
cretization grid, it is important to keep the number of iterations small. 
Hence there are advantages to choosing the initial vector Cq close to the 
optimum C. 

c) As noted in Section 3, to speed up the convergence we calculate, 
as intermediate values, a sequence of n vector S, as: 

= 3[F(y,x) + pP(y)C^‘] + d-3)C^' 

The right choice of the parameter 3 and the number of iterations can 
speed up the convergence significantly. Experience shows that a choice 
of 3 in the range [.5,. 9] and an n between 5 and 20 speeds the conver- 
gence sufficiently. 

d) The convergence is relatively fast. The number of iterations 
varies between 10 and 40; it rarely increases above 50. However, as 
noted before, the initial choice of Cq together with the parameter 3 and 
the intermediate number of iterations n can strongly influence the speed 
of convergence. With appropriate choice of Cq, 3 and n the convergence 
is attained with less than 10 iterations. 
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5. CONCLUSION 


The problem of "Wheat Forecast Economic Effect" has been formulated 
and solved within the framework of stochastic control and Markov pro- 
gramming. A general approach has been developed for the multi -country - 
multi-period model and the simple case of one country - two period model 
has been solved in detail. . 

It has been shown that: 

(i) The states of the ECON model are state estimates rather than 
true states. 

(ii) The number of states in the ECON model may be effectively re- 
duced by one-half. 

(iii) The Markov programming approach avoids simulation and is ap- 
plicable to nonquadratic welfare functions and non-Gaussian errors. 

(iv) Upper and lower bounds are obtained in the Markov programming 
approach so that if iterations are stopped prior to convergence, an 
estimate of the nearness to the optimal solution is known. 

(v) In general, there is no value of information in the infinite hori 
zon stationary case if the incremental value function does not depend upon 
a state. The ECON algorithm produces a value of information in such cases 
by forcing the dynamic programming value function to be quadratically 
dependent on the state and by considering finite horizons in simulations. 

(vi) The main advantages of the dual variable Markov programming 
applied to "Wheat Forecast Economic Effects" can be summarized as follows. 
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First, speedy and efficient algorithms are available for solving Markov 
progranming problems. Second, no solution to a large set of simultaneous 
equations is required. Third, unlike the dynamic programming approach, 
there is no need for an ad hoc assumption on the functional dependency 
of the welfare with respect to the state x. 

The main drawbacks of the above Markov progranming is the memory 
requirement, since the value of the welfare function at each iteration 
must be stored. However, it was found in the one country - two period 
model that a grid size of 9 is adequate and even with a grid size of 5, 
the results are fairly close. Therefore, it appears feasible to solve 
multi-country - multi-period problems with this approach. 

Based on the results of this report, it is recommended that the Mar- 
kov programming approach be applied to the complete ECON model and to 
other value of information problems faced by NASA researchers. 



48 


REFERENCES 


1. ECON, Inc. (1977), "ECON's Optimal Decision Model of Wheat Production 

and Distribution-Documentation," NASA Contract NASW-3047. 

2. Witsenhausen, H. S. (1971), "Separation of Estimation and Control 

for Discrete Time Systems," Proc. IEEE , Vol. 59, pp. 1557-1566 

3. Howard, R. A. (1960), Dynamic Programming and Markov Processes , 

New York: Wiley. 

4. Schweitzer, P. J. (1965), "Perturbation Theory and Markov Decision 

Processes," PhD dissertation, MIT Operations Research Center 
Report IS. 

5. Odoni, A. R. (1969), "On Finding the Maximal Gain for Markov Decision 

Processes," Operations Research , Vol. 17, pp. 857-860. 

6. * Varaiya, P. 1978), "Optimal and Suboptimal Chains," IEEE Trans. Auto 

Corrt. , Vol. AC-23. 

7. * Jones, S. N. (1979), "A Differential Theory of Markov Control," 

unpublished working paper. 

8. ECON, Inc. (1977), "An Optimal Decision Model of Production and 

Distribution with Application to Wheat, Corn and Soybeans," 

NASA Contract NASW-2588. 


*These papers appear as appendices at the end of this report. 



49 


APPENDIX A 

DOCUMENTATION FOR THE 
CROP INFORMATION VALUE PROGRAM 
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DOCUMENTATION FOR THE CROP INFORMATION VALUE PROGRAM 

0. Introduction 

This document provides the necessary background to understand, use 

2 

or modify the Crop Information Value Program (CIVP) developed at S I. 

The following topics will be covered: 

1. Math anati cal Preliminaries 

2. Notation for Input-Output 

3. Using the Programs 

4. Detailed Study of the Program 

The object of 1 - 3 is to provide the user with quickest access to the 
purpose and use of the program, from a basic mathematical sketch of the 
problem, to notation for the basic parameters necessary to run the program, 
and provide detailed instructions for actually running CIVP. Examples of 
runs are provided in the computer output (pp. 

Section 4, on the other hand, is aimed at the user who needs a more 
detailed understanding of the program subroutines, internal variables, 
approximation and search methods, for the purpose of program verification 
or modification. 


1. Mathematical Preliminaries 

In this section the details of the ECON model and formulation of the 
finite-state model will be assumed familiar from our Progress Report. 

We will review briefly the results which are essential to an overall under- 
standing of CIVP. 

1.1 Discrete Markov Chains 

In CIVP, the economy is approximated by a finite-state Markov chain 
with state space X= {l,2,...,ns}, ns being the number of states. Recall 
that a state x^ t X is a multidimensional estimate of the states of crops, 
grain in storage and transit. Due to the independence of the innovation 
sequence of the Kalman Filter from the state estimates x^- themselves (see 
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Progress Report), the statistics, or probability distribution of x(t+l) 
depends only on the present state estimate x(t) = and the behavior of 
consumers, producers, and exporters in the time interval (t,t+l]. The 
behavior results in an overall control v(t). Symbolically, 

Prob (x(t+l) = Xj|x(t) = x^.) = P^.j(v(t)) 

The matrix {P-^-l is called the transition probability matrix at t. If 
we assume v(t) = v^. when x(t) = x^., then the matrix P depends only on the 
vector (v^,. . . ,v^g) = v; we write P(v). 

A discrete (controllable) Markov Chain is then defined by the set V 
of possible controls, X, the state space, and the set {P(v)},v eV of pos- 
sible transition probabil ity matrices. 


1.2 Welfare in a Discounted Economy 

If k(t, v^. , x^.) is a measure of the overall welfare of the economy 
between t and t+1 when in state x^. and exercising control v^., then it is 
reasonable to assume that the economy acts so as to maximize long-term 
"discounted" welfare defined by: 


= I I Prob{x(t) = X. |x(0) = x.}*k(t, v., x.) 
^ t=0 x,eX ^ ^ J J 


where p is the per-period discount factor and i is the starting state. 
In CIVP we define 


= (l-p)W^ 

so that as p ->1, will reach a definite limit (called J^). is a 
constant vector , i.e. , all elements are equal, and represent the average 
welfare per step . When the economy maximizes i.e., the welfare dis- 
counted into the future, instead of J^, then the overall average welfare 
per step (J^) will not be as high as possible. This is what is lost by 
some short-sightedness on the economy's part. 
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1.3 Markov Programming 

The purpose of the program is to find a v.*e V so that j‘^(v' ) s 
for an v' e V. To do this it simultaneously finds c*, a relative state 
value vector, and v*, by the method of successive approximations, c* has 
the properties 

1. V* = argmax pP(v)c* + k(v) 

veV 

2. c* = max pP(v)c* + k(v) + 

V£ V 

(Tt is implicit in this notation that some v*e V will maximize all elements 
of the vectors simultaneously. This is in fact the case.) By using the 
equations, we can generate successive approximations YqjY^**** to c*, and 
Vj,V 2 ,... to V*. The need to know is eliminated by defining 



where 1 = (1 1 1 .. 1). Then since J^ = al for some a, [J^*] = 0 and we can 
wri te 


2'. [c*] = [max pP(v)[c*] + k(v)] 

veV 


also, since p1 = 1, 

1'. V* = argmax pP(v)[c*] + k(v) 

VeV 

The method of Markov Programming is then as follows: set Vq, Yq to 

arbitrary values. Then simply take 

1". v„., = argmax pP(v)[y„] + k(v) 

veV " 

2"- Y„+i = [max pP(v)[Y_] + k(v)] 
n+i n 
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It is guaranteed that Yp'^c* and v„->v* for most conditions. Unfortunately, 
convergence is not guaranteed for the case of cyclic transition matrices. 
However, this problem can easily be fixed by Varaiya's "Dual Method" as 
explained in the next section. 

1.4 Incorporation of Varaiya's "Dual Method" 

The essential difference between Varaiya's "Dual Method" and the 
Markov Programming method as described above is that Varaiya takes 

~ = [max pP(v)[y^] + k(v) - [yj.]] 

instead of 


Yn+1 = [max pP(v)[Y„] + k(v)] 

In practical terms this amounts to taking 

Yn+i =[y„] + e[max pP(v)[y^] + k(v) - [y^]] 

Letting v^^^ be as defined by 1", and assuming [Yp] = Yp» 

2"'- Vl “ \ ^ ' % 

= e[pP(v„^j)Y„ + k(v„+l)] + (1 -€)y„ 

For small enough e, Varaiya's theorem guarantees that Yp'^c* for cyclical 
transition matrices (as well as others). Notice that if e=l, 2'" is; equi- 
valent to 2"; Varaiya's method is the same as Markov Programming. Our 
tests on small models indicated that convergence was fastest when e = 
and this value of e was used in CIVP. 

The actual algorithm implemented is now just a synthesis of the Varaiya 
Algorithm, and Markov Programming. The reason for not using Varaiya's 
method directly is that the time required to compute 1 is much, much greater 
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than the time to compute 2. Notice that if Vq were to consist of only 
a single possible control , let us say tVQ} = Vg, then Varaiya's equation 
would yield 

2. = ^(pP(Vo)v„, + MV(,)) t (I-e)y^ 

and Yp^ will converge to a c(Vq), the dual variable at Vq , at very little 

expense, since only 2 is involved. Let us now take our arbitrary Vq to 

be v^^j in Equation 2"'. Then the more closely y approximates the dual 

variable (by successive application of 1,2 above) then the 

closer will v„.-j = argmax pP(v)y + k(v) be to v* in l’,2"'. In effect, 

veV 

we can speed up convergence to v* and c* at very little additional compu- 
tational expense, by more accurately calculating the dual variable c(Vj^) 
between maximizations. 

The equations are as follows (Vq, Yq arbitrary): 

!"♦ = argmax pP(v)[y_] + k(v) 

veV " 

2.1 Y„,o ■ [pPfVlICl'nJ * 

^n,m+l ~ ^ ^ '^n,m 

^n+1 ^ ^,m 

We have inserted the [ ] at somewhat arbitrary places to assure 1 Yp^p, 
and 1 Yf, are zero. The actual implementation may be somewhat different 
for efficiency. 

1.5 Tracking Convergence - Monotonic Bounds 
From equation 2 (Section 1.3) we can write 

= pP(v*)c* + k(v*) - c* 
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Suppose that and are suboptimal but 

''n+1 argmax,pP(v')c + k(v') 
" ^ v'cV " 


Then 

1 * 

,4. J < maximum element of + k(v) - c) . 

and 

1 * 

4.5 minimum element of (pP( 

Since, after 1" (Section 1.4), 3 will hold, 4 and 4.5 provide simple bounds 
1 * 

on J at each step. In fact these bounds should be mono tonic as n 
increases. These bounds are printed out after each optimization. 

Another bound computed by the program are maximum and minimum average 
return on the present control v^^. This can be done, once again, by con- 
sidering In this case, evidently 

= argmax pP(v') + c„ + k(v') 

VcVq 

so 4 and 4.5 will hold with J^* being replaced with These bounds 

are computed at the intermediate steps v . 

n,m 

2. Input-Output Conventions 

In this section we describe the conventions and data structure used 
in specifying an economic model to CIVP, and the names of the quantities 
computed and displayed by CIVP. We begin with'^.^ input. 

2.1 Scalar Constants 

In parentheses we show the values, used in the present CIVP. 

NPER - Number of periods in year (2) 

NAGG - Number of aggregated crops (1) . 
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NBIN - Number of aggregated storage locations (1) 

DIM = NAGG + NBIN 

NS - Number of discrete states (for all periods) (14) 


2.2 State Vector Convention 

DIM is obviously the state dimension. The convention used for number- 
ing the state variables (different from ECON) is: are the ag- 
gregated crop estimates and are the storage estimates 

(there are NBIN of them). 


2.3 Matrix Data 

Dimensions of each matrix are in brackets [ ]; number of rows is first, 
number of columns is second. 


2.3.1 PLN [NAGG, NPERI 

This matrix contains entries indicating the planting periods, growing 
periods, and non-growing periods for each crop. Specifically let i e {!,... ,NAGG} 
be an aggregated crop. Then in row i, a 1 will appear in column t where 
t is the period in which the first planting is done (see illustration).,- 
If there is a second planting period, a 2 appears, etc. We now divide 


NPER 









2l_ 


_ 0 _ 

_ 0 _ 

2 









PLN 


the remaining periods into two groups: A) periods in which the crop is 

growing and the final harvesting will not begin during that period; B) 
periods in which the crop's final harvesting is taking place, or periods 
after the final harvesting has taken place and before the first planting 
occurs. We define the final harvest to be the period in which all re- 
maining crop is added to storage, so that the crop is 100% harvested by 
^e-"next period. In A periods, 0 (zero) is to appear in PLN; in B periods, 
a -1. 
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One slight restriction of this convention is that the final harvesting 
cannot take place in the same period as the last planting: it must occur 

at least one period later. In the illustration, for example, last planting 
occurs in Period 6, final harvesting in Period 1. It is not possible for 
final harvesting to occur in Period 6. 


2.3.2 HFR [NAGG, NPER] 

For each aggregated crop i, time period t, HFR(i,t) is the fraction 
of crop harvested during that period . The final harvest period is there- 
fore the first period since the initial planting for which the total 
fraction harvested is one. 


NPER 


NAGG 








H. 




H. 


1 







HFR 


-final harvest (since initial planting is 
Period 3 from PLN) 


2.3.3 [DIM, NAGG] 

This matrix shows which aggregated crops i feed which aggregated 
storage location j. Specifically NFGi(j,i) = l if crop i's harvest goes to 
O; otherwise NFC(j,i) = 0. By the state variable convention (Section 2.2), 
j>NAGG; hence the first NAGG rows of NFC are unused. 

2.4 Data for the Discretization Scheme 
2.4.1 N [DIM, NPER] 

N(i,t) is the number of discrete levels for state' variable x^ at period 
t. Restrictions: 

1) Totally harvested crops. Let us call a crop i totally harvested 
if it is 100% harvested and the first planting is not completed. Crop i 
will be totally harvested at t if and only if PLN(i ,t-l) = -1. In the ex- 
ample given (Section 2.3.1), crop 2 is totally harvested in Periods 2 and 3. 
We require N(i,t)= 1 if i is a crop, and i is totally harvested during t. 
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2) N(i,t) IS odd 

3) 3^N(i,t)s9. 

2.4.2 INDl [NPER+1] 

INDl is a time-saving look-up array. Definition: 
t-1 DIM 

INDl(t) := I n N(i,t); INDl(l) =0 
T=1 Z=1 

Note NS = IND1(NPER+1). 

2.4.3 Discrete Levels 

Discrete levels are defined by two matrices: 

M: [DIM, NPER]— the center point of the discrete levels for each state 
variable x^. and period t is M(i,t). 

SIDE: [DIM, NPER]— an archaic name and convention as well. STDE(i,t) 
is the distance between discrete levels of at t divided by 2.4. 

Example: x^. at t has levels 0,10,20. Then M(i,t) = 10, STDE(i ,t) = 10/2.4. 

2.4.4 Information Model 

The information variance, var((j)^.(t)) is carried in a matrix called 

STDl. 

STDl : [DIM, NPER] — STDl(i,t) := /var("$^Xt]T where <)> is defined in the 
Progress Report. 

2.5 Numbering of Discrete States 

Let x.^j(t) denote the level (l<j<N(i,t)) of state x. at t. 

Then a typical state vector at period t will be some (x, . (t), x, • (t),..., 

^DIM i There are NS of these discrete state vectors altogether, 

and they are ordered as follows: first the set of discrete vectors for 

t-1, then t=2, etc. In the sample runs there are 9 discrete vectors 
for t= 1, 5 for t = 2. The set of 9 discrete vectors for t= 1 is ordered 
by increasing levels with the most frequent changes in level occurring in 
the last state variable. Thus the ordering of the discrete states in the 
sample run is: 
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(Xj 


X2,i(l)) 

("l 



(Xi 


X2^3(1)) 


, 2 ( 1 ). 

X2j(l)) 


^2(1)» 




X2,3(1)) 



X2,i(l)) 






X2^3(1)) 


,l(2). 

X2j(2)) 


,l(2). 

X2^2(2)) 


l(2). 

X2^3(2)) 


,l(2). 

X2^4(2)) 


1 ( 2 ). 



2.6 Output Names 

A typical run of CIVP will print out values for various items. The 
naming conventions are: 

XINT : [DIM, NPER]— a time-saving look-up table defined below. 

FACT : [DIM, NPER]— for all x^.,t, the level (l^js N(i,t)) of the 

quantization for x^. at t is 

XINT(i,t) + j*FACT(i,t) 

KM: [NS]— a vector of incremental welfare values for each of the dis- 
crete states, arrayed in the order described in 2.5. (Same as vector k in 
1.2.) KM reflects the incremental welfare for the present suboptimal control 


V: [NS, DIM]— each column of V displays the control vector applied 
for a particular discrete state vector. That is, if x(t) = (Xj j (t),... 

Xdim i ^ 

uim.JdIm 
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x(t+l) = x(t) + v(t) + 4>(t) 

The sequence of suboptimal controls computed are the of Section 1.3. 
GAMMA : [NS]— same as y in 1.3. 

Jl— same as 1.3. 

SIGMA: [NS]— an approximation to the steady state probability dis- 
tribution TT vector which satisfies 

•it(v)P(v) = it(v) 

as converges, so does it(v). 


3. Running the Programs 

Two steps are required to run the program. The first is to create 
a data file with the necessary economic data. The second is to apply CIVP 
to the data file. 

3. 1 Running INITIAL 

A sample run of INITIAL appears of Page 10 of the computer output. 
INITIAL will ask for various scalars, vectors and arrays; all of these 
are defined in Section 2 except for STD, which is not used by CIVP and 
should be set to zero. The initial control matrix V should also be initial- 
ized to zero as this initial value is not actually used by the program, 
but instead a new value is recomputed immediately. Hence the entry for 
V is irrelevant to CIVP. 

After the data entry, INITIAL will make some computations and write 
the results out into a file with logical name INIT (the actual name is 
specified previous to running the program via ASSIGN statement). 

CAVEAT: Arrays must be dimensioned properly before running INITIAL. 

See Section 3.3. 

3.2 Running CIVP (MAIN) 

CIVP assumes two files exist: INIT and COMP are the logical names. 

INIT must have been initialized as described above. COMP should be a 
carbon copy of INIT. In the present program, no results are actually 
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written out to COMP. 

After creating COMP, INIT, and making the logical assignments, 

CIVP can be run. (See page 11.) CIVP will ask for the following: 

Annual Discount factor 

nd - determines the search time. The number of divisions of each 
dimension of the control space when searching for the optimal control. 

A subsequent fine search will then redivide the optimal segment into nd 
parts. The resulting search takes time approximately 2*nd^^^ with an 
effective grid of nd^**^^*^ points. 

nt - the M in Equation 2.3, Section 1.4. 

Then CIVP will display the model data if the user so requests. Then the 
optimization routine begins. (Page 12.) 

Each $$0PTIMIZE$$ represents a large iteration, an application of 
Equation 1", Section 1.4. After application of 2.1, the results 
(y^,Vj^^j, and k(v^^j)) are displayed. Next, after 2. 2-2. 3 have been applied, 
the bounds on are shown, followed by bounds on J^(V^+^)» 

and a (approximation to Tf(v^^.j)). The program will then request to 
begin the next iteration. 

For the examples solved, we achieved convergence in 4-8 iterations. 

Two examples are shown on pp. 12-15. The second example represents 3n 
"improved information" model over the first, and as a result the optimal 
average return came up from -2.136 to -1.930. The incremental welfare 
function was defined simply as (v 2 + 5)^ (square deviation from consumption 
of 5 units). 

3.3 Program Preparation 

Some aspects of INITIAL and miN are model dependent, and must be 
modified for each model. Specifically: 

3.3.1 Matrix Dimension 

The conmon block matrix dimensions must agree with those given in 
Section 2 with the following exceptions: 

a) mean and std are not used by the program. 

b) any matrix with dimension NS (e.g., V, KM, Gamma, Sigma) in 
Section 2 must have at least NS for its dimension in the program. For 
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example, if NS= 14, KM may be dimensional 14, 100, or 1000, but not 10. 

The inequality constraint enables the user to change the discretization 
scheme (and thus NS) without redimensioning any matrices. 

In INITIAL: 

vi : [DIM] 
jp : [DIM] 

In MAIN: 

oldgam, pi, p2, p3 : [at least NS] 

In OPTIMIZE: 

vr, vlow, vhigh, vine, j ; [DIM] 

In UPDATE: 

sigg, pi, gamg : [at least NS] 
vl, j : [DIM] 

In UBOUND: 

vr, vlow, vhigh, vine, j : [DIM] 

In ITERATE: 

Pi, gamma : [at least NS] 
vlow, vhigh, vine, j, vt : [DIM] 

In PROW: 

pi : [at least NS] 

V, j, jp : [DIM] 

3.3.2 Output Formats 

Output formats must be modified suitably in MAIN, PMODEL. 

3.3.3 Loops 

Some loops in the program are nested DIM deep, so they must be changed 
when DIM changes. Specifically, in the routines INITIAL, lines 4500:4900, 
OPTIMIZE, lines 2100:2400, UPDATE, lines 2800:3500, ITERATE, lines 1900:2100, 
3100:3500, PROW, lines 3400:3700, a loop similar to this appears: 


do - jl = 1, n(l,j) I 

j(l) = jl ^ \ 

do - j2 = 1, n(2,j) I : 

j(2) = j2 / 


- continue 
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This pattern should be repeated DIM times. 

3 . 4 Program Compilation 

To create an executing file for INITIAL you must link an object 
module for INITIAL and an object module for K. (See file directory, 

P. 11.) 

To create an executing file for fiAIN (CIVP), you must link object 
modules for MAIN, PMODEL, OPTIMIZE, UPDATE, UBOUND, ITERATE, PROW, APPROX, 
STACK, and K. 


4. Program Details 
4.1 Program Subroutines 

The structure of the program is illustrated in the following diagram, 
where each box represents a subroutine, and arrows represent possible calls 
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Briefly, the purpose of each subroutine is as follows. INITIAL asks for 
user input for the economic model, initial values, etc., and sets up a 
disk file containing a common block with this information for use by the 
main program. K is the subroutine used to calculate rewards associated 
with the discrete states. 

MAIN reads in the cotimon block from the disk file and then calls other 
routines. The first, PMODEL, prints out the economic model data. MAIN 
then calls OPTIMIZE and UPDATE in alternation to find the optimal control 
vector, and associated optimal values. 

OPTIMIZE is passed the approximate value vector and finds (see 
Section 1.4): 

!"• v„.i = argmax pP(v)[y„] + k(v) 
f’+i veV " 

by searching the control space V, first using a coarse grid, then a fine 
grid centered on the best cause value. The bounds on the possible control 
values are found by UBOUND, and ITERATE is called first for the coarse 
search, then the fine search. The five highest values found are stacked 
by STACK, and then the values of v^^^j and y' and k(v) are passed back to 
MAIN. 

UPDATE improves the approximation to the state value vector by itera- 
tion of the following equation (see Section 1.4): 

for M iterations, passes y„ ]v] back to MAIN, and MAIN y^+j = so that 
the OPTIMIZE cycle can begin again. 

In the above equations it is clear that the value of the transition 
matrix P must be recalculated with each new control In fact, if 

the state space is so large, it is not practical to store more than one 
row of P at a time. The subroutine which calculates a row of P for given 
control u is PROW. PROW uses APPROX to approximate the Gaussian distribution 
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by a finite number of points. G is the Gaussian probability distribution 
function; CEIL is the ceiling function; FLOOR is the floor function. 

Let us now take a closer look at each subroutine. 


4.2 INITIAL 

Inputs: Accepts model data at teletype. 

Purpose: To initialize a disk file (named 'INIT') with the economic 

and initial data. 

Operation: The common block defined in lines 100-800 is equivalanced 

with an array called 'data' in lines 1500-1600, so that the entire common 
block can be written on the disk as a unit in line 6000. The disk fill 
is opened in lines 2000-2100. Then, several parameters and matrices are 
read in (see Section 2 for a description and definition of these arrays) 
with lines 2200-4200. 



divide 

probability between 
all states in each 
period 


From the data that has been read in, several additional dependent 
arrays can now be initialized. In lines 4400-5500 the arrays GAMMA, SIGMA, 
and KM are initialized. Recall inducing scheme for the discrete states 
described in 2.5. The outermost index, line 4500, is the time period, j. 
The next most significant index is and represents the amount growing 
in the first aggregated crop (in this case there is only one aggregated 
crop). The least significant index, 32 > is the level of grain in the 



66 


aggregated bin. The time period j, and levels and jg completely 
determine the discrete state. If there were more than one crop in a 
bin, then there would need to be more indices to specify a state. 

The variable i is simply used to count the discrete states, for in- 
formation about the i^*’ discrete state is stored in the i^*^ location of 
each array. Thus the incremental welfare of state i, is stored in km(i) 
(line 5200). It is calculated by calling the incremental welfare value 
function K (see details of K, Section 4.3). The initial probability 
of state i is stored in sigma(i) (line 5300). It is taken to be 

l/(# of discrete states in period j * number of periods) so that I sigma(i) 

i=l 

Finally, gamma(i) (the initial discounted welfare estimate Yq) is set to . 
zero. 

Then XINT and FACT are initialized in lanes 5600-5900 by the formula 
given in Section 2.6. All the data is written out onto disk and this 
concludes the operation of INITIAL. 

4.3 K (Page 9 of the computer output) 

Inputs: A control v = (Vp . • • .v^jj^) which is usually some row of 

the feedback control matrix. A time period index t. 

Result: A real number, the incremental welfare when control v is 

applied to state x at time t, k(v,x,t) of Section 1.2. Since this does 
not actually depend on x, x is not passed. 

Operation: For the simplified model we simply took k= -(target con- 

o 

sumption - actual consumption)^. Since consumption = -v(2), this is 
accomplished by line 400. 

4.4 MAIN 

In MAIN and in subsequent subroutines, arrays and type declaration 
will appear in the following order: first comnon block definition, or 

whatever part is needed; second, the scratch common block SCR, if needed; 
and finally, local variables to the subroutine. In MAIN, all three types 
of declaration appear. 

Input: Reals in disk file 
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Output: See Section 3.2 

Operation: (Refer also to 3.2): First the disk file is read to 

initialize the common block (line 2500). The fill 'comp' is not used in 
this version of the program. The program then asks for discount factor, 
number of divisions nd for the coarse and fine search, and the number of 
update time nt, (Min Equation 2.3, Section 1.4). The annual discount 
factor is recomputed into a per-period discount factor p in line 3300. 

In line 3700 the economic data is printed out by a subroutine PMODEL. 

Then OPTIMIZE is called to find as in Equation 1", Section 1.4. The 
results are printed out with lines 3900-4100. The next step is to update 
the value approximation and probability approximation y and o per Equa- 
tions 2. 1-2. 3, Section 1.4, and also find bounds on the optimal return 
as described in Section 1.5. ATI of these functions are performed by 
the routine UPDATE. The updated values are returned to MAIN and printed 
out. 

The call ing conventions and detailed workings of PMODEL, OPTIMIZE, 
and UPDATE follow. 

4.5 PMODEL 

Arguments: None. 

. Returns: None. 

Operation: Data is acquired via the common block, and the data is 

printed out in lines 1500-2900 if the user so desires. 

4.6 OPTIMIZE 

Arguments: ND - number of divisions in coarse and fine search 

RKO - per-period discount factor 
GAMMA - the y^ as described previously (vector). 

Returns: GAMMA - y^^^ 

OLDGAM - the original value of GAMMA, namely y^ 

V - the best control matrix found by searching the control 
space; the v^^j as previously described. 

KM - incremented welfare vector for the discrete states at 
this control 
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Operation: 

1. Lines 1300-1400. GAMMA ^ OLDGAM 

2. Lines 1700-4800. Find y' . km such that 

(1) = argmax P(v)y^ + k(v) 

VeV 

(2) y' = max P(v)y_ + k(v) 

VeV ” 

(3) km = k(Vj^^j) 


Set V to GAMMA to y' . kf'i to km and return. Task 2 can be broken up 

as follows: 

2a. Index through the discrete states, i is the state number, t 
is the time period, jl is the first index, j2 is the second index (see 
Section 2.5 for more details of the state indexing scheme). Either i or 
(t, j^, J 2 ) completely specify the discrete state, i is used for some 
cases, namely looking up positions in km, gamma, etc., whereas (t, jj, J 2 ) 
is used when the acutal levels associated with the state are needed, i.e. , 
the values of x,,...,XnTM are needed. Thus we will refer to a discrete 


•’^DIM 

state as either x^ or x^. ,• 

I L j J 


1’''2 


x^- or X 


Within lines 2600-4700 we are now concerned only with a single state 

,th 


t,Ji,j 


1,J2 


Letting P^. be the i row of P(v), we are thus only con- 


J. u 

cerned with maximizing the i*^" component of y' and v: 


y‘ (i) = max P.-Y + k. 


v(i) = argmax P-y + k. 

Vi^V. 


where V^. is the set of possible controls which can be applied to state i. 
Diagrammatical ly, 
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/ v; \ 
(/////) 

/ 

y' 



The element of y is the old value of state i; the element of y' 
is the new value is state i; the i row is v is the control applied to 
state i . We now proceed to task 2b. 

2b. Initial izinq the stack . USTACK is a stack of the five best 
controls and the associated values which are found during the search. 
They are arranged as follows: 



Best 

Second best 


Fifth best 


In each row is the new value, followed by the control applied (a 2 vector). 
The highest values are on top. But before the search begins., is is necessary 
to initialize the values to a low number, so that the real values later 
computed will fill the stock. 
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2c. Calculating the bounds on admissible control . In line 2800, 
UBOUND is called to calculate the bounds on admissible controls for 
state j,t. These bounds are returned in the vector vlow, vhigh, so that 

vlow(l) < Vj ^ vhigh(l) 

• • • 

• • ■ 

• • • 

vlow(dim)s s vhigh(dim) 

nd is the number of divisions that will be searched in each dimension, and 
vine is a vector of the increments that should be made in each dimension 
as the search proceeds. For dim=2, say, UBOUND would return information 
defining the following grid: 



high(l) 
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2d* Searching the coarse grid . Now having the bounds on the admis- 
sible controls for the state under question, we search through the possible 
controls to find a highest value. Namely, we find the v^- which maximizes 


max 

v,.V, 




+ k. 


The actual search for thef: highest value is done by the subroutine 
ITERATE (line 3000), and the highest values and associated controls are 
returned on VSTACK as described above. 

26* Defining the fine grid . Around the optimal coarse point we now 
define a much finer grid. This is done by again calling UBOUND and speci- 
fying ur, the optimal coarse control, as the center of the fine dearch. 


• • . • 

vxXXyvyyy 
x/Xx'iiXx' / X. 

KXXXXXXVX 

X X X [Tjoptimalx • » 

^ ^ X y X coarse x 

X X y X X point X 

xxXXXXxXX 

xxxxvxxxX 

• # 

and width of the fine search to be twice the distance between the coarse 
points. Then in lines 3400-3500 ur is read off the top of the stack. 

In lines 3600-3700 the stack is again set to low values. Then in line 3800 
UBOUND is called to find vlow, vhigh, and vine for the fine grid. 

2f. Searching the fine grid . Once the bounds on the fine grid are 
known, we call ITERATE once more to search the fine grid (line 4000). After 
this call, the maximum control and value are known and we can set y] to 
be the highest value on the stack (line 4200), set the i^^ row of v (the 
control matrix) to the best control of this state i (lines 4400-4600), and 
set km(i) to the incremental welfare for this control (line 4700). This 


coarse 

grid 

points 


fine 

grid 

points 


> 

X 

i 

i 

V 

'X 

K 

y 

'X 
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concludes the subroutine OPTIMIZE, as we now have solved Equations (1), 

(2), (3) given above. 

4.7 UBOUND (Page 5 of computer printout) 

Note : For the ECON model , UBOUND must be reprogrammed. We give a 

description here of the simplified version. 

Arguments: j.t - indices of a discrete state. 

tl - the time period subsequent to t 

ur - a control vector, the center of the grid 

prop - proportion of available control space which is to 
be searched. If prop = 0, all available control 
space is searched and ur is ignored. 

nd - number of divisions in each dimension 

Returns: vlow, vhigh, vine - vectors of the low, high and increments 

in each control dimension. 

Operation: 1. We look successively at each dimension i (line 1100); 

that is, the discrete state x. . given as an argument is a vector 

^j,t " (^j,t,r---’^j,t.DiM^ 

where x- ^ ^ is the amount of grain growing in the first aggregated crop, 
X. ^ 2 second aggregated crop, and so forth, with being 

the amount of grain stored in the last aggregated bin. We consider each 
dimension separately and first calculate 

(line 1200), the amount of grain in that state variable. Then according 
to whether i is an aggregated crop or bin, t is a planting time or not, 
we branch to different parts of the program. 

2. i is an aggregated crop . (True at line 20). 

Case 2a. Preplantinq or nonplanting season . (True at line 40). The 
only possible control is zero; obviously the amount planted must be zero 
in a nonplanting period. 
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Case 2b. Planting season . (True at line 50). We restrict the 
maximum sowing to the highest level representable in period tl. The 
highest level at tl is n(i,tl)* fact(i,tl) + xint(i,tl); hence the amount 
sown must be less than vhigh as given in line 2000. 


control must put level at 
tl in this range 


negative planting not allowed 
discrete values of variable i at tl 

For a lower bound on the control, notice that the amount sown must 
cause a level at least as large as the lowest grid point, which is xint + fact. 
Thus the control must be at least xint+ fact - ml . Also, the control must 
be positive; hence line 1900. 

3- i an aggregated bin (True at line 30). 

3a. First we calculate the amount of grain in storage assuming that 
no consumption, exports, or imports are made. This is done by summing the 
amount harvested from each*^ feeding crop ij into the aggregated bin (lines 
2300-2500). This amount available at tl with no consumption, imports or 
exports is called ml and is different from "stock" which is the amount of 
stock at time t. 

3b. The lowest stock level representable at tl is xint+fact, so 
the lowest control is that which leaves us at that level at time tl, 
namely fact + xint-ml (line 2700). 

tn»fact+xint 


2 • fact + xint 

fact + xint (lowest value) 
discrete values available at tl 


ml 


harvest^ 
stock ^ 


Diagram for planting season : 
h • fact + xint 

(n-l)fact+ xint 

present level #* 

at t £ 4 . ^ • j. 
fact + xint 
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3c. The highest possible control is somewhat hard to calculate. In 
the ECON model, too, this calculation will be done somewhat differently. 

Certainly if 

ml > n • fact+ xint 

then 


V < n • fact + xint -ml 


However , 


V + stock s 0 

so that 

V s -stock 


Hence, 

V < min(-stock, n • fact + xint - ml) 

If it so happens, however, that this minimum is less than vlow, we must 
therefore take 

vhigh = max(vlow, min(-stock, n • fact + xint - ml)) 


(lines 2800-2900). 

4. If prop = 0, we are done, so calculate vine and return (line 3700). 

5. If prop > 0, then we must recalculate the control bounds around 
the central control ur as shown below. This is carried out in lines 3300- 

3500. 




'arm 




'y// 
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4.8 ITERATE 

Arguments: vlow, vhigh, vine - lowest,, highest and increments for 

control space 

gamma - as described above. 

j,t - indices of the state whose control is to be 
optimized 

tl - time period subsequent to t. 

rho - per-period discount factor. 

Returns: vstack - a stack of the five controls with highest values. 

Operation: pi is used to store a row of the probability transition 

matrix P^-(v). Lines 1900-2300 and 3100-3500 are "written out" loops for 
speed. These loop through the possible controls. (The i= 1 statement 
at i = l is superfluous and should be removed.) 

The inner lines. 2400-2800, find the value of each control by the 
equation 


value = I P^.(vt.ii).y^(ii) + k.(vt.t) 


where vt is the control under examination, y^ is the old value vector, and 
PT{vt,-) is the row of the probability transition matrix for state i under 
control vt. Once the value has been calculated, STACK is called in line 
2900 to stack the value of the control and control itself should the value 
be one of the five best discovered so far in the search. 


4.9 UPDATE 

Arguments: rho - discount factor 

V - feedback control matrix 

km - imcremental welfare vector 

gamma - y^^^ as described in Section 1.4. 

’ ■ oldgam - y , 

^ 'n,m-l 

sigma - approximation to probability distribution 

Returns: gamma - as described in Section 1.4. 

oldgam - y 

'n,m 

sigma - updated approximation to probability distribution 
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bl, b2 - lower and upper bounds on welfare 
(The function of UPDATE is also overviewed in Sections 4.1 and 4.4.)^ Note: 

when UPDATE is past gamma = ^1 *"2 MML 

welfare, but when UPDATE is called thereafter with gamma = Yn,m. m>0, 
b, and b, will bound only the welfare of this particular feedback control 
matrix v. Thus the first call to update in MAIN is separated from the re- 
maining calls. , , ^ 

Operation: Following Equation 2.2, Section 1.4, UPDATE calculates: 


''^n,m+l 




n.m 


and also 

Vi = 

Mhat would thus be a straightforward matrix multiplication and addition 
is complicated by the fact that P is too large to be stored; one row is 
calculated at a time. 

Indices: Let us begin by sorting out the indices. The row of P 

which we are calculating, and then the element of y which can be calculated 
(notice, though, that an element of a requires all the rows of P) is the 

index i. 

Of course, a row of P corresponds to the outward transition proba- 
bility from some discrete state, and this discrete state is indexed by 
t, J' 2 » etc., as described in Section 2.5. 
tl is the time period after t. 

nlow and nhigh are bounds on the indices (that is, the positions in 
Y or a) of discrete states at time period tl. These are the states for 
which there is nonzero probability of going to it in the state i. Thus, 
these are the first and last elements in the i row of P which must be 

multiplied by y (see diagram): 
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Partitioning of P: 

Per 1 
Per 2 
Per NPER 



I 

nonzero • 




C\J 

u 

<u 

a. 



nonzero t 



CD 

Q. 


Thus a typical row of P, P^. , can be ignored except for nlow through nhigh: 


- -I _ _ _i 


- -\/ 


(_2_! Q 

— — . - — -L - - 


P f t Y 

nlow nhigh 


— nlow 
''' -nhigh 


In other words: 


nhigh 

*^i^ ■ ^ Pj(ii)Y(ii) 

n=nlow 


Step 1, Initialize sigg, the updated sigma, to zero. Later sigg->sigma 
as required. (Lines 1600-1700, line 5600). 

Step 2. Index a row of P, call it i or (j,t) (lines 2200-3500). Let 
vl be the control applied to state x^., the i^^ row of v (line 3700). Let 
pi be the i^" row of P(v) (line 3800). 

Step 3. Calculate = sum and P^.Yp^i^_j^ = sum2 by above equation . 

(lines 3900-4400). Also calculate <^n,P(Vp+i) ’by 


ns 


(line 4400). 
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Step 4. Set 

gamg = ^pPi(VlK,m * 

(Notice a dcval still remains in this expression.) Calculate devalue in 
line 4600, and calculate bound (cf. Section 1.5) in lines 4700-4900. 

Here 6vm2 is a calculated element of (pP(Vp+i)Cn + 

Step 5. (lines 5300-5600). Return the computed values in the 

proper arrays. 

4.10 PROM 

Function: Calculates a row of P(v). 

Arguments: j,t - discrete state index. 

V - control (to be applied to this particular state). 

Returns: pi - row of P(v) corresponding to probabilities leaving 

state j,t. 

Note: iSince the probability transition matrix is cyclic, certain elements 
of pi are constrained to be zero, but these elements are not actually 
zeroed by PROW. This must be remembered when using pi. 

Operation: 1) Consider the discrete states at tl, arranged in a 

dim-dimensional grid. (See illustration, dim=2.) We must somehow approxi- 
mate the continuous probability distribution on this space, by a discrete 
probability distribution. 
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To simplify the problem a bit, we can assume that our discrete probability 
distribution is the product of dim independent probability distributions: 

^ij = Pi-Pj l^i<h(l,t) l<j<h(2,t) 

and thus reduce our problem to finding an approximation to a one-dimensional 
continuous distribution: 



A matrix P, dimensioned dimxg, holds each of the dim independent 
probability distributions in successive rows. Lines 1400-2200 compute the 
discrete probabilities for the first NAGG state variables, lines 2400-3100 
compute the probability distributions for the remaining NBIN state variables, 
and then lines 3300-4500 multiply the independent distributions together 
appro friately to calculate the discrete probability at each point on the 
dim-dimensional grid. 

Steg_l. For each crop, calculate the mean of the expected amount 
planted at the next time period under control v. This is done in lines 
1500-1800. If tl is a pre-planting period, then there is only one level 
for the discretization (namely 0), hence the probability of going to that 
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State is one (line 2100). Otherwise APPROX is called to the pDF and stored 
in the i^*^ line of P (line 1900). 

Step 2 . For each bin, calculate the expected amount of grain stored 
at the next step. This is simply the present amount plus all harvest 
(lines 2600-2900). Again, APPROX calculates the PDF and stores it in the 
i^^ 1 ine of P. 

Step 3 . Recall that a discrete state is ordered by t, jp j'2» etc. 
Line 4400 calculates 

DIM 

pi(i) = n p(line ii, point j-) 
ii=l 

the product distribution as above. 

4.11 APPROX 



Arguments: ml - mean of probability distribution 

sigl - standard deviation of probability distribution 

xint, fact - define the first discrete level (xint+fact) 
and distance between discrete levels (fact) 

n - number of discrete levels 

line - line of matrix a in which to place probability 
approximations 

Returns: a - matrix for discrete probabil ity distribution (only one 

row will be affected, namely "line") 



81 


Operation: 1) The subroutine first checks to see that ml is between 

xint + fact (the lowest discrete point) and n* fact+xint (the highest dis- 
crete point). If ml is outside this range (it shouldn't be if UBOUND works 
properly), then APPROX prints an error message and aborts. Error detected 
is lines 1000-1500. 

2) Just to make sure, set ml in range if it is outside range (line 
1900). 

3) Zero out probability distribution a(line,*) (lines 2000-2100). 

4) Each probability point will be computed by numerical integration 
of the continuous curve. The number of points for each point in the dis- 
crete approximation is nd and is computed in line 2200. 

5) nbl - the first index for which there is any significant proba- 
bility (see figure) 



nt2 - the last index for which there is any significant probability, 
ntl - the index just to the left of the mean. 
nb2 - the index just to the right of the mean. 

Note: ntl = nb2 if mean of distribution lies exactly on a discrete point. 

This special case is taken into account in the program. 

arm - 2.4 + sigl, i.e. , the distance to which there is any significant 
probability, or the distance to the first or last discrete point from the 
mean, whichever is smallest. 

6) Numerical integration (line 2900). This is a first approximation 
to a discrete PDF. 

7) Probability refinement (lines 3200-6100) coaxes the mean of the 
discrete PDF to be the same as the mean of the continuous PDF. Let fh be 
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the mean of the crude discrete PDF from numerical integration. Then 


nt2 

m = I (i*fact+ xint)*pi 
i=nbl 


To coax m to m, we define and a .2 such that 
ntl 

m= I (i*fact + xint)*a,*pi + I 
i=nbl i=nb2 


(i*fact+ xint)*a 2 *pi 


and also 



ntl nt2 

I Q,-’®! + I 

i=nbl ^ ^ i=nb2 


pi*a2 


letting 


pi 


A 


ntl 

i=Li 


pi 



nt2 

I 

i=nb2 


pi 


si ^ I i*pi 
i=nbl 


A 

s2 ^ I i*pi 
i=nb2 


we then have 

1 ~ ^^^Pj *^2^2 

in = Oj^'fact’Sl + a 2 *fact*s 2 + xint 


pi, si, p2, s2 are calculated in lines 3200-4100, with lines 4300-4700 
taking care of the exceptional case ntl = nb2. 

We now rewrite the equations for ap a 2 as follows: 

1 = a^p^ + 02P2 

m = aj(fact*sl + xint*Pj) + a2(fact*s2 + xint*Pj^) 
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Redefine Sj~fact*si + xint*Pj, S 2 = fact*s2 + xint'Pj, (lines 4800, 4900), 
so that 

Pi P2W«i\ /I 

Si S 2 / \ / ■ V m 

Lines 5100-5300 solve this equation for Oj, 02 * Lines 5400-6100 then itiul- 

Pnbl Pnti and Ppb 2 through p ^^2 by a^, adjusting if 

necessary for ntl = nb2. 
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APPENDIX C 

A DIFFERENTIAL THEORY OF MARKOV CONTROL 
Steven N. Jones 


Abstract 


We consider the problem of controlling a Markov Process so as 
to minimize the long-run discounted (or undiscounted) cost. A new 
approach is taken, based on a matrix M which represents the difference 
in future state occupation caused by different starting states. Simple 
expressions result for the derivatives of the limiting state probability 
vector 7t(u) and cost J(u) with respect to changes in the applied control 
u. Using these derivatives, explicit formulas are derived for ir(u') 
and J(u') where u' differs from u in the control of a single state, and 
for this case it is shown that the sign of J(u') - J(u) depends on a very 
simply calculated discriminant. This leads to several new necessary 
and sufficient conditions for an optimal u*, which hold for both the 
discounted and undiscounted cases: optimality, first-order necessary 
conditions on the derivative are shown to be sufficient, Varaiya's 
necessary and sufficient condition for an optimal dual variable is ex- 
tended to the discounted case, as is his bound B(u) > J(u) - J*, and several 
previous results are reproven from the differential "perspective. 
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A DIFFERENTIAL THEORY OF MARKOV CONTROL 

Steven N. Jones 
Scientific Systems, Inc. 

^ Cambridge, f'lA 02138 

CONTENTS 

1. Introduction 

2. The Markov Control Problem 

3. Problem Formulation 

4. A Differential Theory 

5. Optimality Conditions and Bounds 

6. References 

1, Introduction 

The Markov control problem is defined in Section 2, previous work 
on the problem is reviewed briefly and the approach to be taken in this 
paper is outlined and the major results summarized. Section 3 gives a 
more mathematical formulation of the problem, and derives a succinct 
algebraic formulation of the problem which is proved equivalent in Sec- 
tion 4. 

Section 4 comprises the differential theory: the differential state 

occupation matrix M is defined, derivatives are defined of •rr(u) and J(u), 
and simple expressions are derived for them in terms of M. An explicit 
formula for changes in tt and J under statewise control policy changes is 
found, and the Monotonicity Theorem is proved. This theorem states that 
if u' differs from u in the control of a single state, then the sign of 
J(u')-J(u) depends on the sign of the discriminant 6^.+A^.c(u), where 
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c(u) IS a dual variable at u. ej«, = k(u') - k(u) (k is the vector of 
incremental costs associated with each state, e, is the i‘'' row of the 

identity matrix) and eia, = P(u' ) - P(u) . i„ fact, J(a(u'-u)tu) is mono- 
tonic in a. 

New optimality conditions are given in Section 5, whose proofs rely 
on the Monotonicity Theorem. It is proved that state-wise optimality 
is equivalent to global optimality, that global optimality is guaranteed 
by non-negative derivative, and the necessity and sufficiency of a dual 
variable is proven for the discounted case. Varaiya's bound B(u)^J(u)-J* 
IS also extended to the discounted case, and some previous results are 
reproven from the differential perspective. 

2- The Markov Control Problem 

Consider a perfectly observable Markov process x^, t = 0,l,..., with 
finite state space X = {!,... ,s} . and a set of available controls U(i) for 
each 1 eX. We assume that by choosing the controls u^ according to a 
stationary control pol icy u = (u(l) ... u(s)) e U(l)x. . .xU(s) = U, so that 

^t ~ ® Markov process wil have a stationary state 

transition matrix P(u). At each time t there is an incremental cost q^, 
or reward -q^, from the Markov process which depends on the state x^ and 
on the applied control u^ = u(x^) : q^=k.(u(i)) under control policy u if 
x^-i. Thus the statistics of q^, t = 0,l,..., depend on the control policy 
not only through the functions k.(u), i e X, but also through the statis- 
tics of x^, t = 0,l,... which are determined by Xq and P(u). 

The problem considered here, which we call the Markov Control Problem. 
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is to find stationary controls which minimize the long-run discounted 
or average cost. The long-run average cost starting in state j under 
control policy ueU is defined as: 

(2.1) oJ(u) = 1 ™e(^- 

Which we call the "undiscounted" cost. It is well known that this 
limit converges for stationary control policies (Doob, [1]), and to 
guarantee that the long-run average cost is independent from the starting 

state, we make the following assumption: 

Strict Erqodicity Assumption . For any ueU, there is a it(u) such that 

(2.2) 1 im P(u)^ = 1 jt(u) 

t-K» 

where 1^= (1 . • • 1) ' • * 

Although most all of our results hold under the more general "single 

ergodic class" assumption (Varaiya [2]). which also guarantees that j] 
will not depend on j, we will restrict ourselves to the above condition 

for simplicity in the presentation. 

Another type of cost frequently encountered is "discounted" cost; 

for a discount factor 0<p<l: 

(2.3) J?(u) = E{i p\ (u(x^))|Xq = j| (jeX) 

^ ^t=0 t 

(2.3.5) J^(u) = (Jj(u) ... Js(u))' 
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In general even under the strict ergodic assumption, but there 

is a close relationship between the discounted and undiscounted costs: 
there exists a "dual variable" c(u), which is defined in Section 4 for 
any u e U, such that tt(u)c(u) = 0 and 

(2.4) jP(u) = ^ jl(u) + c(u) 

Thus 7r(u)j'^(u) = J^(u)/(l-p). 

.^ai^kov Control Problem, as considered in this paper, is to mini- 
mize jP(u) subject to ueU for a particular 0<p^l. Since is in 
general a vector, it is not immediately clear that all elements of can 
be minimized simultaneously. However, by assuming perfect state obser- 
vation (i.e., assuming that the controls u^ may depend on x^), and by 
assuming that U is compact and P,k are continuously dependent on u, it 
can be proved that the elements of can be minimized simultaneously 
and an optimum stationary control u*eU exists which achieves this global 
minimum (Kushner, [3]). In fact, u* will be optimal over all feedback 
controls (Kushner, [3]). 

Many researchers have addressed the Markov control problem, finding 
necessary and sufficient conditions for the optimality of u"^, methods for 
finding bounds on J^(u*), and algorithms for computing successive stra- 
tegies whose cost approach the minimum. Most of this work has centered 
around some form of the following equation: 

c* = min (pP(u)c* + k(u) - J^(u)j.) 


(2.5) 
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Any solution c* to this equation is called an optimal dual variab le, and 
there is no confusion in notation as the c(u) mentioned above equals c* 
when u is an optimal control. It is known for p=l (Varaiya, [2]), and 
has been assumed for general p (and will be proved in this paper), that 
if c* is an optimal dual variable, then the minimizer of the right side 
is an optimal control policy. Furthermore, if u* is optimal, then there 
exists a c* (which we will show equals c(u*)) which satisfies: 

(2.5.5) c* = pP(u*)c* + k(u*) - J^(u*)i 

Equations (2.5) and (2.5.5) thus constitute a necessary and sufficient 
condition for u* to be optimal, and it is interesting to review the 
previous results on the Markov control problem from this viewpoint. 

Howard Algorithm . Consider (2.5) for p = 1 in the following form: 

(2.6) c* + J^(u*)l= min (P(u)c* + k(u)) 

ueU 

Any solution c* of the above equation is an optimal dual variable 
(since it satisfies Eq. (2.5)), and it can be checked that c* + jl(u*)l 
is also an optimal dual variable. Howard [4] and Schweitzer [5] showed 
under certain conditions that optimal dual variables are in a sense 
"stable fixed points" of the above equation, so that for the sequence 
beginning with an arbitrary Vq, and 

(27) V. ,1 = min (P(u)v- + k(u)), i = 0,1,. . . 
ueU 

the v.'s approach optimal dual variables, v - v^ ^ J^(u*)i, as we would 
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expect from (2.6), and thus the minimizing u^'s approach minimum cost. 
Odoni then showed [6] that for each i = 0,l,..., the largest element of 
- v^- upper bounds J(u*), and the smallest element lower bounds J(u*). 
One slight problem with this algorithm is that the conditions for 
convergence are not fully general, but in most cases it is the most 
practical algorithm to use. An analogous algorithm for the discounted 
case is: 

(2.8) V.., = min (pP(u)v- + k(u)) 

ueU V 

and v^. is assured to converge to a definite dual variable since p<l; 
however, to the author's knowledge, no analogy to the Odoni bound has 
been formulated. 

Varaiya Algorithm (p=l only). For the undiscounted case, Varaiya 
[2] defined Q(u) = P(u) - I, and a "Hamiltonian" H(u,c) = Q(u)c + k(u) . Then 
Eq. (2.5) can be written as 

(2.9) J^(u*)i= min (Q(u)c* + k(u)) = min H(u,c*) 

ueU ueU 

Varaiya proved the necessary and sufficient properties of (2.5) in this 

form. He then showed that for any c, min H(u,c) = H(u' ,c) : 

ueU 

(2.10) min min H-(u,c) < J^(u*) < J^(u') < max min H-(u,c) 

ieX ueU ' " " ■ ieX ueU ’ 

and gives a scheme for modifying c to bring the left and right sides of 
(2.3) closer together, so that c->c*, and H-»- H(u*,c*) . This algorithm 
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converges under the most general conditions (a single chain with transient 
states) but is formulated as a differential equation for c, rather than 
an algorithm which recursively computes a discrete series of c's. 

Earlier work, and most of the results we have not reviewed here, 
have also relied on some form of Eq. (2.5), and we refer the reader to 
Ross [1], Kushner [3], Howard [4], or Bertsekas [8]. 

The Differential Approach . Our work takes a different approach to 
the problem, based on the concept of differential state occupation . We 
define m. .(u) to be the difference in total expected future occupation 

* J 

of state j in units of time if Xq has probability one of being i rather 
than probability distribution ir(u). Take e. as the row of the sxs 

J 

identity matrix. Then 

(2.11) m^.j = (e^. -7r)ej + (e^. -ir)Pej + (e^. -7r)P^ej + ... 

Linder the strict ergodicity assumption, this sum will be shown to converge 
for all i,j, and be continuous in u. The m.. can be arranged into a 

* \J 

differential state occupation M and 

00 

(2.12) M(u) = I (I-1tt(u))P^(u) 

t=0 

In this paper we will show that M is a useful theoretical tool in 
Markov control. It is, for example, related to derivatives of ir(u) and 
J(u), has important algebraic properties useful in Eq. (2.5), leads to 
new and stronger optimality conditions, and relates the discounted to the 
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undiscounted costs, for c(u) = M(u)k(u) . Let us review these applications 
in more depth. 

Take u, u'eU and let A = P(u' ) - P(u) . We can define a derivative 
of 7t(u) in the direction of A as: 


(2.13) 


dir _ 
dA " 


1 im 
e->0 


(P+eA) - it(P) 
e 


We will show that this limit always exists and that 
(2.14) ^=ttAM 


In addition, since M is a function of u, we can define a derivative 
for M and 


(2.15) ^ = 1 im M( . fO . = mam 

dA - e 

e-K) 

Eq. (2.14) and (2.15) can be considered differential equations for 
(ir(P+A), M(P+A)) in the independent variable A. When A consists of 
only one nonzero row, these equations can be solved analytically for 
(ir(P+A), M(P+A)) in terms of (tt(P), M(P)), and if A^. is the nonzero row 
of A, M^. is the i^*^ column of M, then 


V(u')^ 


’Tr(P+A)' 


V(u)' 

1 


M(u'), 


M(P+A), 


M(u). 


M^.(u)A^M(u)^ 


Notice for an arbitrary A, the sequence (7 t,M)(P), (Tr,M)(P+ej^A 2 ) , 
(iT,M)(P+ejAj^+e 2 A 2 ), leading to (tt,M)(P+A) can be recursively computed 
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from (2.16). Another application of Eq. (2.16) is our 

Monotonicity Theorem . For P(u')-P(u) having a single nonzero 
row i, 


(2.17) A.M(u)k(u) + 6 = 


iff 


J'(u‘) 


J^(u) 


where e|6 = k(u ' ) - k(u) , elA^. = P(u' ) - P(u) . 0 

All of these results extend to the discounted case, as a discounted 
is defined as: 

oo 

(2.18) mP(u) = I p^(I -iTr)P^(u) 
t=0 

the differential equations (2.14) and (2.15) are supplemented by one for 
adding an additional row for M^(u') in Eq. (2.16), and the Monotonicity 
Theorem holds with slight modification. 

Consider next the algebraic properties of M. Let Q=pP-I. Then for 

0 ^ p ^ 1, 


(2.19) qPmP = M V = iTf - I 

so by taking c(u) = M^(u)k(u) , we see that c(u) solves (2.5.5): 

(2.20) pPc + k-jh = Qc + c+k-ljrk = (hr - I)k + c + k - jbrk = c 
and thus the optimal dual variable c* in (2.5) can be taken to be 


(2.21) c* = M(u*)k(u*) 



97 


The proofs of optimality conditions in Section 5 will be facilitated 
by the following algebraic properties, which are interesting in their 
own right: 

(2.22) irM*^ = = 0 0^ 1 

(2.23) Q"^ = -(MP+j^iTr) Ogp<l 

(2.24) JP*(l-p) = J^+ (l-p)*M^k 0< p< 1 

Eq. (2.24) exhibits the relationship between discounted and undiscounted 
cost, and we see that by "normalizing" the discounted cost by a factor ■ 
of (1-p), as p->l. This normalization will be assumed hereafter 

in the paper. 

The differential theory described above leads to new and strengthened 
necessary and sufficient conditions on u*. First, it is proved that if 
a control u is optimal with respect to changes in the controls of single 
states, (i.e., u' differs from u in only one u(i)), then u is (globally) 
optimum. Obviously the converse holds so this is a necessary and suf- 
ficient condition. 

Second, let u e U, and 

(2.25) $(u) = {P(u')-P(u), k(u') - k(u))|ueU} 

Then it is shown that 

(2.26) > 0 for all (A,k) e $(u) 

is a necessary and sufficient condition for u to be optimal. 
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A third result is an extension of Varaiya's necessary and sufficient 
condition: 

(2.27) J^(u)^ = min (Q^(u')c + k(u')) = min H(c,u') 

u'eU u'eU 

to the discounted case. In addition, Varaiya's bound B(u)>J*-J(u) is 
extended to the discounted case. 

We are presently working on improved algorithms based on this result 
and the new optimality conditions. 

3. Problem Formulation 

The state space of the Markov process is X= s}, the stationary 

control space U is a compact cartesian product U = U(l)x, . .xu(s) , k^. : U(i)-»-R 
are continuous functions for each i e X, and P : is continuous 

with the strict ergodic property (Eg. (2.2)) holding for each P(u), ueU. 

Take : U->R according to Eg. (2.1), and for 0<p<l take : U^R 
according to (2.3.5) times a normalization factor (1-p). As we said ear- 
lier, this normalization factor insures that: 

(3.1) lim JP(u) = J^(u)l 
P^l 

which we shall prove in Section 4. For any 0<p<l, u^^* is called p-optimal 
iff every element of J^(u*)<J^(u) for all ueU. 

To arrive at a more succinct mathematical statement of the Markov 
control problem, consider an alternate expression for 0<p<l. Let 
n?.(u) be the expected total discounted future occupation of state j under 



99 


policy u (in units of time), given that XQ=i. That is, including a 
normalization factor (1-p) for consistency: 

00 

(3.2) nf .(u) = (1-p) I P^e.p^(u)ei 

where e^ is the i^*’ row of the sxs identity matrix. By arranging the 
n?.(u) into an sxs matrix n*^(u) we have: 

* J 

(3.3) nP(u) = (1-p) I pV = (l-p)d-pP)"^ 

t=0 

and it follows that 

(3.4) JP(u) = n^(u)k(u) 

We will show that lim n^(u) =1 tt(u) = lim P^(u), so that (3.4) holds for 
. P>1 t-x» 

p=l also if II'^(u) is defined to be Ijr(u). 

We can now formulate the Markov control problem more succinctly. For 

p<l, is uniquely specified by the equation: 

(3.5) = -(l-p)l 

where QP=pP-I. Although (3.5) holds for p=l, one additional constraint 
is needed to uniquely specify n^. In Section 4 we will see that 1^ 
for all 0<p<l, so this constraint with (3.5) uniquely specifies and 
the Markov control problem can then be written: 

(3.6) jP(uP*) = min {nP(u)k(u) iQ^n^ = -(1-p) I, nPl=n 

ueU 
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We take Eq. (3.6) as the formal definition of the problem, 0<p<l. 

4. A Differential Theory 

Let (u(l) u(2) ... u(i) ... u(s)) = ueU be given and suppose 
u' = (u(l) u(2) ... u'(i) ... u(s)) differs from u only in the control 
applied to i. The major results of this section are explicit expressions 
for tt(u'), J^(u'), and M(u') in terms of ir(u) and M(u), and the Mono- 
tonicity Theorem (Eq. (2.17)). These results lead to new optimality 
conditions in Section 5. 

The development begins with discussion of the new matrix M and its 
properties. The derivatives of ir(P), J(P) and M(P) with respect to 
changes in P can then be expressed in terms of M. We will write a dif- 
ferential equation in P(u) + a(P(u' ) - P(u)) where a is the scalar parameter 
to be varied between 0 and 1, and by noting that P(u') differs from P(u) 
in only one row, we can solve the differential equation for 7 r(a), M(a) 
and J'^(a). Letting a = 1 we have the result mentioned above. Note that 
this result applies to both the discounted and undiscounted cases. 

For any u e U, 0 < p< 1, define A(u) = 1-1 (u) . Notice that A*^ = A, and 
that AP = PA. Define: 

oo 

(4.1) M^(u) = I pV(u) for 0<p^l 

t=0 

and M^(u) = A. 

Since the magnitudes of all eigenvalues of P are no greater than one, 
this sum must converge for p<l (and uniformly). To see that it must 
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converge for p = 1 also, recall that lim p'^ = rrr by the Strong Ergodicity 
Assumption, and since (P - Ijr)*^ = (AP)*^ = AP*^ = - jj:, lim (P-ljr)'^ = 0. 

Thus the geometric series: 

(4.2) I (P-br)" = I AP" = M^(u) 

n=0 n=0 

must converge (and uniformly also). In fact, since tt is continuous in 
u (Varaiya, [2]), and P is continuous in u by definition, must also be 
continuous in u. Also can be defined in closed form by evaluating 
the left side of Eq. (4.2) and 

(4.3) = (I - p(P - br))'^ forO<p<l 

M has many interesting properties, some of which are given in the 
following theorem. Recall that Q(u) = pP(u) - I . 

Theorem 1 . a. Ql^=-(l-p)_l 

b. ttQ = -(1-p) 

c. irM = Ml_ = 0 

d. QA = AQ = A if p = 1 

e. AM = MA = -A 

Proof: (a) through (c) follow easily from the definitions, (d) is 

due to Ql^= ttQ = 0 when p=l. For (e), we see that 

(4.4) pMP = i p^"" = 1 pV = M-A 

t=0 t=l 



102 


so MQ= -A. Also 

(4.5) pPM = I p^'^^PAP^ = I = M-A 

t=0 t=0 

since PA = AP, thus QM=-A. I 

For p<l, Q is invertible and it can be checked from the above re- 
lations that 

(4.6) Q ^ = -M - hr 

It is this "inverse" property which makes M particularly useful in 
derivations. 

relates to and to for p< 1, 

(4.7) nP=(l-p) I pV = (1-p) I (p^AP^ + p^IttP^) 

t=0 t=0 

= (l-p)MP+hr = (l-p)MP + n^ 

Since M is continuous in u, we see that lim n^ = n^, so equality of the 

P^l 

first and last matrices in Eq. (4.7) holds for p=l also. 

Using (4.7), we can express in terms of and 

(4.8) JP = n^k = (l-p)Mk + J^(u) 

The quantity M(u)k(u) appears so often we define it to be c(u), and 
we will later see that this vector represents the relative costs of the 
states under k, or in effect is a "dual variable" of u. 



103 


We now turn to the application of M in the calculation of deriva- 
tives. The notion of a feasible direction is a preliminary concept. 

Definition . Let L={(P(u), k(u))lueU} and for any ueU call 
$(u) = L - (P(u), k(u)) the set of all feasible directions from u. The 
convex hull of L is denoted L. ■ 

Lemma . L satisfies the Strict Ergodicity Assumption iff L does. ■ 
If (A,6)e$(u), define the one-sided derivatives 


(4.9) 


nP(P + eA) - nP(P) 


(4.10) 




if the limits exist. 

dr[P djp 

Theorem 2 . Let u e U be given and let (A,6)e$(u). Then 

jmP 

and ^ 7 — exist for all 0<p<l and 
dA - - 


(4.11) 

dRP _ 
dA " 

PRPamP 


(4.12) 

dJP . 
dA,6 

= nP(6 + pAMk) 



dM^ _ 
dA 

[M^AM^ p = 1 


(4.13) 

[pM^AM^ + brA( pM^ - ) 

0 < p < 1 


If e(A,6) satisfies the Strong Ergodicity Property for small enough e, then 
the above derivatives are two-sided. 

Before turning to the proof, we v/ill show that Eq. (4.11) can be 
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derived from simple intuitive reasoning. Consider first a change 

A = ee'.(e. -e. ), a small perturbation in P which adds probability e 
‘ J J 2 

to p.. and subtracts probability e from p.. . For small e, what is 

iJl ij2 

7T. (P + ce'. (e . - e. )) -ir.(P)? In changing only one row in P we expect the 
Markov process to run, intuitively speaking, as it normally would except 
when leaving state i. When the process is in state i, however, there is 
an added probability of e of going to and e less probability of going 
to J 2 . We can explore the change in overall behavior by analyzing the 
effect of each occupation of i. 


Recall that m^ is the difference in occupation of state k by 
starting in rather than ir, and m^ is the difference in occupation 
of k by starting in J 2 rather than ir. Thus 


(4.14) 


""jlk ■ "’j2k 


represents the difference in future occupation of state k when starting 
in rather than J 2 » and e times (4.14) must be the difference in total 
occupation of k each time the Markov process is in state i. Since state i 
occurs with frequency , we expect an average difference in occupation 
of state k to be: 

(4.15) 


Now any cA can be expressed as: 


(4.16) 


y ee'-e. A.. = J eeiA.. (p. -e. 

i.j'l "^1 ’-^l i,jj iJi h ^2 
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00 

since ^ A. . e . =0, and so 

^-^1 h 

(4.17) ^^(P)-tt^(P + cA) - h.k-% 

1 j J 2 i L c 

= I "’/ij ("'o,k> = 

1 5 J 2 i i 

and therefore: 

(4.18) it(P + A) - ir(P) = eirAM 

Let us now prove Theorem 2 formally. 

Proof : For e ^ 0 let = n^(P + eA) , = pP + peA-I , and for e > 0 take 

D^= (nP-ng)/e. Since 

(4.19) = -d-p)I - pen^A 

(4.20) ngQQ = -(I-P)I 

we can get an equation for by subtracting (4.20) from (4.19) and dividing 
by e: 

(4.21) = -pngi 

For p<l, e>0, Eq. (4.21) has a unique solution for since is 
invertible. For p=l. however, (4.21) yields only s-1 linearly independent 
equations, but the one additional independent condition 

(4.22) = 0 
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specifies uniquely for all p. It can be checked that D^=pIl^AM*^ is 
a solution to the above equations: 

(4.23) (pnPAMP)QQ = -pn^AA = -pH^A 

and Eq. (4.22) is satisfied since M*^l_= 0. Thus 

(4.24) litn. D = lim. pH^AM*^ = pH^AM^ 

e-^ ^ e-*G ^ 

Eq. (4.12) follows from (4.24) and the chain rule. 

Eq. (4.13) can be derived for p<l using (4.7) rewritten in this form 

(4.25) MP=Y^(n^-n^) (0<p<l) 

dni 

For the case p= 1, can be calculated by defining a in complete 
analogy to IT^. I 

Consider again the situation u=(u(l) ... u(i) ... u(s)) and 
u' = (u(l) ... u'(i) ... u(s)). The above derivatives can be used to 
solve for n*^(u'), J^(u'), and M^(u') in terms of n*^(u) and M^(u). Let 
A=P(u') -P(u); A has only a single non-zero row and there exists a A^. 
such that A = elA^. . Also 6= k(u') - k(u) = ej6^. for some scalar 6^.. Let a 
be a scalar parameter, and define 

(4.26) (P(a),k(a)) = (P(u) +a(P(u')-P(u)),k(u) +a(k(u')-k(u))) 

Then for any value of a between zero and one inclusive. 


(4.27) (P(a),k(a)) e $(u) 
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where the overbar indicates the convex hull. The derivatives of all 
quantities exist and are two-sided for 0<a<l, and are one-sided for 
a = 0 and a= 1, by Eq. (4.27). We will find n^(u') and M*^(u') by writing 
a differential equation for n(a) and M^(a), solving the differential equa- 
tion, and then taking a= 1. To begin, let M*? be the column of M^, 

J 

Vj(a) = A^.M^(a) for j = l,...,s. Then 


dv^(a) dM?(a) 

(4.28) = A, zrr = A.M^AM? 


= A. j— 
1 da 


but this expression is a function of v9 and v^ since 

J 


(4.29) A-MW = p(vf ... vP)e'.A.M9 = pv?v^ 

* -L o I 1 J I J 


PIq* A mP = ,^wPwP 


Thus we can solve first for v?(a) and then all of the other Vj and get: 

o 

(4.30) v?(a) = -J— 

^ l-pavP(O) 

Since M|(a) exists and is finite for a=l, and v^(a) is continuous, we 
J J 

must have also: 


(4.31) l-pvj(O) = 1-pA.Mj > 0 (0<p^l, i,je{l,...,s}) 

an ancillary fact we will use in proving the Monotonicity Theorem. 

To continue, we can next solve for n^(a), since the derivative of 
the column of n^, 

J 
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(4.32) = pn?v5 


Again solving first for 11^, and then IIj, we get 


av^(O) 

(4.33) n^Ca) = n?(0) + n?(0) — 

J J ^ l-pav^CO) 


Letting a=l, and writing (4.33) in matrix form, we arrive at an expression 
for n^(u' ) : 


(4.34) nP(u') = nP(u) + 1-PA. M7 ^i^i^^^ 

where M? is the i^*^ column of M? and n? is the i^^ column of n?. 
Once we have the v^'s, it is easy to solve for since 

vJ 


(4.35) 


dM](a) 

da 


"H 


and (4.35) must have a solution in the same form as the solution to Eq. (4.32) 
(4.36) M^(u') = n*(u) + — Wm5(u)4,h' 

To get for p<l, use Eq. (4.7): 


(4.37) mP(u') = (n'^(u') - n^(u')) 


We now have analytical expressions for n^(u') and M*^(u') in terms of 
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n^(u), M^(u), n^(u) and M^(u). Since is a function of and 11^, we 
see that the triple (n^(u'), M^(u'), M^(u')) can be explicitly calculated 
from the triple (n^(u), rr(u), M'^Cu)), when u' is different from u in the 
control of a single state. 

Let us now turn to a calculation of J^(u') in terms of the last 
triple; this is certainly possible since 

(4.38) JP(u') = nP(u')k(u') 

We spare the reader the necessary algebra which reduces Eq. (4.38) to 
the following: 

6. + pA.M^k „ 

(4.39) J^(u') = J^(u) +— n?(u) 

1-pA.M. ^ 

where all of the quantities on the right side are taken at u. 

With this expression we can now prove the following theorem. 

Theorem 3 . (Monotonicity Theorem). Let u' differ from u in the 
control of a single state, elA^ = P(u') - P(u) , ej6^. = k(u' ) - k(u) . Then 



r \ 
> 





a. J^(u' ) - J^(u) 

= 

0 

if 

6. + pA^.c*^(u) 

> 


< 




< 


where c^(u) =M^(u)k(u). The inequality must be strict for at least one 
element of J^(u') - J^(u); hence the "if" is an "if and only if" for p=l. 

b. J(P + ae|A^, k + aelA^) is monotonic in a for 0^P<1. 



no 


Proof : (a) Recall from Eq. (4.31) that 1-pA^.M. must be greater 

than zero. Since the elements of n?(u) are non-negative, and n?(u) 
has at least one strictly positive element (namely n?^.(u)), (a) follows, 
(b) Follows from Lemma, Eqs. (4.39) and (4.31). I 

5. Optimality Conditions and Bounds 

In this section several new necessary and sufficient conditions for 
a global optimum u* are proved. In addition, several previous results are 
reproved or extended, as the differential viewpoint offers a new perspec- 
tive, and in most cases, a simpler proof. 

Consider the following conditions for a fixed ueU: 


Cl. 

JP(u') ^ Jf^(u) 

all u' 6 

U 



C2. 

J^(u) = min (Q^(u')c^ 

+ k(u')) 

where c*^ = 

mP(u 

)k(u) 


u'eU 





C3. 

JP(u) = min (pQ^(u')cP 

+ k(u')) 

where c = 

M (u 

)k(u) 


u'eU 





C4. 

6 + pAM(u)k(u) > 0 

all (A, 6) 

e $(u) 



C5. 

J'^(u') , /(u) 

for all u 

' e U s.t. 

for 

some i e X 



o' (j) = Li(j) unless 

j = i 

all jeX 

C6. 

dj'^(P(u),k(u)) ^ 0 
dA,6 = 

all (A, 6) 

6 $(u) 




Cl is of course a statement of global optimality. C5 and C4 are new; 



in 


C2 is known to be equivalent to Cl when p=l (Varaiya, [2]); C3 and C6 are 
new. 

Theorem 4 . All of the above conditions are equivalent. Thus, any 
condition implies u is a global optimum and the solution to the Markov 
control problem. 

We will need this preliminary lemma: 

Lemma 2 . Let u e U, Ogp^l, c^(u) = M*^(u)k(u) . Then 

(5.1) J^(u)l = Q*^(u)c*^(u) + k(u) 

Proof : Q^(u)c^(u) = Q^(u)M^(u)k(u) = -A(u)k(u) = (IjT-I)k(u) = J^(u)l^- k(u) ■ 

Proof of Theorem 4 : (Cl-‘-»-C2). Obviously Cl->-C2. For any u', 

) = Q^(u' )c^(u' ) + k(u' ) by Lemma 2. If C2 holds, then Jj^(u) g Q^(u‘ )c^(u) + 
k(u'). Since n*^(u) has only non-negative elements 

(5.2) n'^(u')Jj(u) = nP(u')lTT = Jj(u) 

< nP(u')QP(u')cP + nP(u')k(u') 

= (iTr(u') + (1 -p)mP(u')qP(u')cP + JP(u') 

= -d-p)c + /(u') 

Thus J^(u) = Jj^(u) + (l-p)c < J^(u' ) which is condition Cl. (This proof is 
in direct analogy to Varaiya [2]). 

(C2^C3). These two equations differ only by a constant factor 
(l-p)cP. 

(C2-<->C4). By Lemma 2, C2 is equivalent to the statement: 
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(5.3) (p(P + A) - I)c^+ k+ 6 > (pP-I)c‘^ + k 
which is equivalent to 

(5.4) pAM^k +6.^0 
which is C4. 

(C4-M-C5). Follows directly from the Monotonicity Theorem. 
(C5-HVC6). C6->C5 by the Monotonicity Theorem. Suppose then that 

condition -C6 holds, so that for some A, 6, 

(5.5) = nP(6 + pAM^k) ^ 0 

Since all of the elements of are non-negative, there must be an i e X 
such that 

(5.6) 6. +pA.M'^k<0 

Again, by the monotonicity theorem, J?(P + elA^, k + ej6^.) < J^.(P,k) which 
is -C2. ■ 

We now extend Varaiya's bound B(u) < J^(u) - J^(u*) to the dis- 
counted case. Recall the Hamiltonian 

(5.7) H(u,y) = Q(u)y + k(u) 

Extending this to the discounted case, so that H^(u,y) = Q^(u)y+ k(u) , 
we get an analogous result. 
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Theorem 5 . Let y be an arbitrary column vector, let 0<p<l, and 
choose u e U such that 

(5.8) H^(u,y) = min H^(u',y) 

u'eU 

Letjh = min H?(u,y), h = max H?(u,y). Then 
ieX ^ ieX ^ 

(5.9) hi + (l-p)y I JP* < J'^(u) < hi + (l-p)y 

Furthermore, if ji= h", then y = c*^* + ai where a is a scalar, J*^(u) = J^*, 
and H^(u,y) = J^(u*). Ifalsoiry=0, theny=c^*. 

Proof : Recall that -(l-p)I. Thus 

(5.10) n^(u)H^(u) = nPQPy + n^k = JP(u) - (l-p)y 
and 

(5.11) nP*(u*)H^(u*) = j'^*-(l-p) 

Since all of the elements of are non-negative and n^i= 
fol 1 ows . 

Now suppose h^=F. Since then = hi= H, J^*=J^(u), 

(5.12) H+ (l-p)y = J^(u) = J^(u) - (I-p)mPc^ 

SO 

y = (J^-H) + c^ 


i. Eq. (5.9) 
u = u^*, and 


(5.13) 
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and if 0 = tty then 0 = "ll) H = 


■ 
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